Systems and methods for detecting and scoring anomalies

ABSTRACT

Systems and methods for detecting and scoring anomalies. In some embodiments, a method is provided, comprising acts of: (A) identifying a plurality of values of an attribute, each value of the plurality of values corresponding respectively to a digital interaction of the plurality of digital interactions; (B) dividing the plurality of values into a plurality of buckets; (C) for at least one bucket of the plurality of buckets, determining a count of values from the plurality of values that fall within the at least one bucket; (D) comparing the count of values from the plurality of values that fall within the at least one bucket against historical information regarding the attribute; and (E) determining whether the attribute is anomalous based at least in part on a result of the act (D).

RELATED APPLICATION

This application claims the benefit under 35 U.S.C. §119 of U.S.Provisional Patent Application No. 62/214,969, filed on Sep. 5, 2015,which is hereby incorporated by reference in its entirety.

This application is filed on the same day as application Ser. No.______, entitled “SYSTEMS AND METHODS FOR MATCHING AND SCORINGSAMENESS,” bearing Attorney Docket No. L0702.70006US00, and applicationSer. No. ______, entitled “SYSTEMS AND METHODS FOR DETECTING ANDPREVENTING SPOOFING,” bearing Attorney Docket No. L0702.70003US01. Eachof these applications is hereby incorporated by reference in itsentirety.

BACKGROUND

A large organization with an online presence often receives tens ofthousands requests per minute to initiate digital interactions. Asecurity system supporting multiple large organizations may handlemillions of digital interactions at the same time, and the total numberof digital interactions analyzed by the security system each week mayeasily exceed one billion.

As organizations increasingly demand real time results, a securitysystem have to analyze a large amount of data and accurately determinewhether a digital interaction is legitimate, all within fractions of asecond. This presents tremendous technical challenges, especially giventhe large overall volume of digital interactions handled by the securitysystem.

SUMMARY

In accordance with some embodiments, a computer-implemented method isprovided for analyzing a plurality of digital interactions, the methodcomprising acts of: (A) identifying a plurality of values of anattribute, each value of the plurality of values correspondingrespectively to a digital interaction of the plurality of digitalinteractions; (B) dividing the plurality of values into a plurality ofbuckets; (C) for at least one bucket of the plurality of buckets,determining a count of values from the plurality of values that fallwithin the at least one bucket; (D) comparing the count of values fromthe plurality of values that fall within the at least one bucket againsthistorical information regarding the attribute; and (E) determiningwhether the attribute is anomalous based at least in part on a result ofthe act (D).

In accordance with some embodiments, a computer-implemented method isprovided for analyzing a digital interaction, the method comprising actsof: identifying a plurality of attributes from a profile; for eachattribute of the plurality of attributes, determining whether thedigital interaction matches the profile with respect to the attribute,comprising: identifying, from the profile, at least one bucket ofpossible values of the attribute, the at least one bucket beingindicative of anomalous behavior; identifying, from the digitalinteraction, a value of the attribute; and determining whether the valueidentified from the digital interaction falls into the at least onebucket, wherein the digital interaction is determined to match theprofile with respect to the attribute if it is determined that the valueidentified from the digital interaction falls into the at least onebucket; and determining a penalty score based at least in part on acount of attributes with respect to which the digital interactionmatches the profile.

In accordance with some embodiments, a computer-implemented method isprovided for analyzing a digital interaction, the method comprising actsof: determining whether the digital interaction is suspicious; inresponse to determining that the digital interaction is suspicious,deploying a security probe of a first type to collect first data fromthe digital interaction; analyzing first data collected from the digitalinteraction by the security probe of the first type to determine if thedigital interaction continues to appear suspicious; if the first datacollected from the digital interaction by the security probe of thefirst type indicates that the digital interaction continues to appearsuspicious, deploying a security probe of a second type to collectsecond data from the digital interaction; and if the first datacollected from the digital interaction by the security probe of thefirst type indicates that the digital interaction no longer appearssuspicious, deploying a security probe of a third type to collect thirddata from the digital interaction.

In accordance with some embodiments, a system is provided, comprising atleast one processor and at least one computer-readable storage mediumhaving stored thereon instructions which, when executed, program the atleast one processor to perform any of the above methods.

In accordance with some embodiments, at least one computer-readablestorage medium having stored thereon instructions which, when executed,program at least one processor to perform any of the above methods.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1A shows an illustrative system 10 via which digital interactionsmay take place, in accordance with some embodiments.

FIG. 1B shows an illustrative security system 14 for processing datacollected from digital interactions, in accordance with someembodiments.

FIG. 1C shows an illustrative flow 40 within a digital interaction, inaccordance with some embodiments.

FIG. 2A shows an illustrative data structure 200 for recordingobservations from a digital interaction, in accordance with someembodiments.

FIG. 2B shows an illustrative data structure 220 for recordingobservations from a digital interaction, in accordance with someembodiments.

FIG. 2C shows an illustrative process 230 for recording observationsfrom a digital interaction, in accordance with some embodiments.

FIG. 3 shows illustrative attributes that may be monitored by a securitysystem, in accordance with some embodiments.

FIG. 4 shows an illustrative process 400 for detecting anomalies, inaccordance with some embodiments.

FIG. 5 shows an illustrative technique for dividing a plurality ofnumerical attribute values into a plurality of ranges, in accordancewith some embodiments.

FIG. 6 shows an illustrative hash-modding technique for dividingnumerical and/or non-numerical attribute values into buckets, inaccordance with some embodiments.

FIG. 7A shows an illustrative histogram 700 representing a distributionof numerical attribute values among a plurality of buckets, inaccordance with some embodiments.

FIG. 7B shows an illustrative histogram 720 representing a distributionof attribute values among a plurality of buckets, in accordance withsome embodiments.

FIG. 8A shows an illustrative expected histogram 820 representing adistribution of attribute values among a plurality of buckets, inaccordance with some embodiments.

FIG. 8B shows a comparison between the illustrative histogram 720 ofFIG. 7B and the illustrative expected histogram 820 of FIG. 8A, inaccordance with some embodiments.

FIG. 9 shows illustrative time periods 902 and 904, in accordance withsome embodiments.

FIG. 10 shows an illustrative normalized histogram 1000, in accordancewith some embodiments.

FIG. 11 shows an illustrative array 1100 of histograms over time, inaccordance with some embodiments.

FIG. 12 shows an illustrative profile 1200 with multiple anomalousattributes, in accordance with some embodiments.

FIG. 13 shows an illustrative process 1300 for detecting anomalies, inaccordance with some embodiments.

FIG. 14 shows an illustrative process 1400 for matching a digitalinteraction to a fuzzy profile, in accordance with some embodiments.

FIG. 15 shows an illustrative fuzzy profile 1500, in accordance withsome embodiments.

FIG. 16 shows an illustrative fuzzy profile 1600, in accordance withsome embodiments.

FIG. 17 shows an illustrative process 1700 for dynamic security probedeployment, in accordance with some embodiments.

FIG. 18 shows an illustrative cycle 1800 for updating one or moresegmented lists, in accordance with some embodiments.

FIG. 19 shows an illustrative process 1900 for dynamically deployingmultiple security probes, in accordance with some embodiments.

FIG. 20 shows an example of a decision tree 2000 that may be used by asecurity system to determine whether to deploy a probe and/or which oneor more probes are to be deployed, in accordance with some embodiments.

FIG. 21 shows, schematically, an illustrative computer 5000 on which anyaspect of the present disclosure may be implemented.

DETAILED DESCRIPTION

Aspects of the present disclosure relate to systems and methods fordetecting and scoring anomalies.

In a distributed attack on a web site or application, an attacker maycoordinate multiple computers to carry out the attack. For example, theattacker may launch the attack using a “botnet.” In some instances, thebotnet may include a network of virus-infected computers that theattacker may control remotely.

The inventors have recognized and appreciated various challenges indetecting web attacks. For instance, in a distributed attack, thecomputers involved may be located throughout the world, and may havedifferent characteristics. As a result, it may be difficult to ascertainwhich computers are involved in the same attack. Additionally, in anattempt to evade detection, a sophisticated attacker may modify thebehavior of each controlled computer slightly so that no consistentbehavior profile may be easily discernible across the attack.Accordingly, in some embodiments, anomaly detection techniques areprovided with improved effectiveness against an attack participated bycomputers exhibiting different behaviors.

I. Dynamically Generated Fuzzy Profiles

Some security systems use triggers that trip on certain observedbehaviors. For example, a trigger may be a pattern comprising ane-commerce user making a high-value order and shipping to a new address,or a new account making several orders with different credit cardnumbers. When one of these suspicious patterns is detected, an alert maybe raised with respect to the user or account, and/or an action may betaken (e.g., suspending the transaction or account). However, theinventors have recognized and appreciated that a trigger-based systemmay produce false positives (e.g., a trigger tripping on a legitimateevent) and/or false negatives (e.g., triggers not tripped during anattack or being tripped too late, when significant damage has beendone).

Accordingly, in some embodiments, anomaly detection techniques areprovided with reduced false positive rate and/or false negative rate.For example, one or more fuzzy profiles may be created. When anobservation is made from a digital interaction, a score may be derivedfor each fuzzy profile, where the score is indicative of an extent towhich the observation matches the fuzzy profile. In some embodiments,such scores may be derived in addition to, or instead of, Booleanoutputs of triggers as described above, and may provide a more nuancedset of data points for a decision logic that determines what, if any,action is to be taken in response to the observation.

The inventors have recognized and appreciated that, although manyattacks exhibit known suspicious patterns, it may take time for suchpatterns to emerge. For instance, an attacker may gain control ofmultiple computers that are seemingly unrelated (e.g., computers thatare associated with different users, different network addresses,different geographic locations, etc.), and may use the compromisedcomputers to carry out an attack simultaneously. As a result, damage mayhave been done by the time any suspicious pattern is detected.

The inventors have recognized and appreciated that a security system maybe able to flag potential attacks earlier by looking for anomalies thatemerge in real time, rather than suspicious patterns that are definedahead of time. For instance, in some embodiments, a security system maymonitor digital interactions taking place at a particular web site andcompare what is currently observed against what was observed previouslyat the same web site. As one example, the security system may compare acertain statistic (e.g., a count of digital interactions reporting acertain browser type) from a current time period (e.g., 30 minutes, onehour, 90 minutes, two hours, etc.) against the same statistic from apast time period (e.g., the same time period a day ago, a week ago, amonth ago, a year ago, etc.). If the current value of the statisticdeviates significantly from the past value of the statistic (e.g., bymore than a selected threshold amount), an anomaly may be reported. Inthis manner, anomalies may be defined dynamically, based on activitypatterns at the particular web site. Such flexibility may reduce falsepositive and/or false negative errors. Furthermore, the security systemmay be able to detect attacks that do not exhibit any known suspiciouspattern, and such detection may be possible before significant damagehas been done.

II. Techniques for Efficient Processing and Representation of Data

The inventors have recognized and appreciated that a security system fordetecting and scoring anomalies may process an extremely large amount ofdata. For instance, a security system may analyze digital interactionsfor multiple large organizations. The web site of each organization mayhandle hundreds of digital interactions per second, so that the securitysystem may receive thousands, tens of thousands, or hundreds ofthousands of requests per second to detect anomalies. In some instances,a few megabytes of data may be captured from each digital interaction(e.g., URL being accessed, user device information, keystroke recording,etc.) and, in evaluating the captured data, the security system mayretrieve and analyze a few megabytes of historical, population, and/orother data. Thus, the security system may analyze a few gigabytes ofdata per second just to support 1000 requests per second. Accordingly,in some embodiments, techniques are provided for aggregating data tofacilitate efficient storage and/or analysis.

Some security systems perform a security check only when a user takes asubstantive action such as changing one or more access credentials(e.g., account identifier, password, etc.), changing contact information(e.g., email address, phone number, etc.), changing shipping address,making a purchase, etc. The inventors have recognized and appreciatedthat such a security system may have collected little information by thetime the security check is initiated. Accordingly, in some embodiments,a security system may begin to analyze a digital interaction as soon asan entity arrives at a web site. For instance, the security system maybegin collecting data from the digital interaction before the entityeven attempts to log into a certain account. In some embodiments, thesecurity system may compare the entity's behaviors against populationdata. In this manner, the security system may be able to draw someinferences as to whether the entity is likely a legitimate user, or abot or human fraudster, before the entity takes any substantive action.Various techniques are described herein for performing such analyses inreal time for a high volume of digital interactions.

In some embodiments, a number of attributes may be selected for aparticular web site, where an attribute may be a question that may beasked about a digital interaction, and a value for that attribute may bean answer to the question. As one example, a question may be, “how muchtime elapsed between viewing a product to checking out?” An answer maybe a value (e.g., in seconds or milliseconds) calculated based on atimestamp of a request for a product details page and a timestamp of arequest for a checkout page. As another example, an attribute mayinclude an anchor type that is observable from a digital interaction.For instance, a security system may observe that data packets receivedin connection with a digital interaction indicate a certain sourcenetwork address and/or a certain source device identifier. Additionally,or alternatively, the security system may observe that a certain emailaddress is used to log in and/or a certain credit card is charged inconnection with the digital interaction. Examples of anchor typesinclude, but are not limited to, account identifier, email address(e.g., user name and/or email domain), network address (e.g., IPaddress, sub address, etc.), phone number (e.g., area code and/orsubscriber number), location (e.g., GPS coordinates, continent, country,territory, city, designated market area, etc.), device characteristic(e.g., brand, model, operating system, browser, device fingerprint,etc.), device identifier, etc.

In some embodiments, a security system may maintain one or more countersfor each possible value (e.g., Chrome, Safari, etc.) of an attribute(e.g., browser type). For instance, a counter for a possible attributevalue (e.g., Chrome) may keep track of how many digital interactionswith that particular attribute value (e.g., Chrome) are observed withinsome period of time (e.g., 30 minutes, one hour, 90 minutes, two hours,etc.). Thus, to determine if there is an anomaly associated with anattribute, the security system may simply examine one or more counters.For instance, if the current time is 3:45 pm, the security system maycompare a counter keeping track of the number of digital interactionsreporting a Chrome browser since 3:00 pm, against a counter keepingtrack of the number of digital interactions reporting a Chrome browserbetween 3:00 pm and 4:00 pm on the previous day (or a week ago, a monthago, a year ago, etc.). This may eliminate or at least reduce on-the-flyprocessing of raw data associated with the attribute values, therebyimproving responsiveness of the security system.

The inventors have recognized and appreciated that as the volume ofdigital interactions processed by a security system increases, thecollection of counters maintained by the security system may becomeunwieldy. Accordingly, in some embodiments, possible values of anattribute may be divided into a plurality of buckets. Rather thanmaintaining one or more counters for each attribute value, the securitysystem may maintain one or more counters for each bucket of attributevalues. For instance, a counter may keep track of a number of digitalinteractions with any network address from a bucket B of networkaddresses, as opposed to a number of digital interactions with aparticular network address Y. Thus, multiple counters (e.g., a separatecounter for each attribute value in the bucket B) may be replaced with asingle counter (e.g., an aggregate counter for all attribute values inthe bucket B).

In this manner, a desired balance between precision and efficiency maybe achieved by selecting an appropriate number of buckets. For instance,a larger number of buckets may provide a higher resolution, but morecounters may be maintained and updated, whereas a smaller number ofbuckets may reduce storage requirement and speed up retrieval andupdates, but more information may be lost.

The inventors have recognized and appreciated that it may be desirableto spread attribute values roughly evenly across a plurality of buckets.Accordingly, in some embodiments, a hash function may be applied toattribute values and a modulo operation may be applied to divide theresulting hashes into a plurality of buckets, where there may be onebucket for each residue of the modulo operation. An appropriate modulusmay be chosen based on how many buckets are desired, and an appropriatehash function may be chosen to spread the attribute values roughlyevenly across possible hashes. Examples of suitable hash functionsinclude, but are not limited to, MD5, MD6, SHA-1, SHA-2, SHA-3, etc.

For example, there may be tens of thousands of possible user agents. Theinventors have recognized and appreciated that it may not be importantto precisely keep track of which user agents have been seen. Therefore,it may be sufficient to apply a hash-modding technique to divide thetens of thousands of possible user agents into, say, a hundred or fewerbuckets. In this manner, if multiple user agents have been seen, theremay be a high probability of multiple buckets being hit, which mayprovide sufficient information for anomaly detection.

III. Dynamically Deployed Security Probes

Some security systems flag all suspicious digital interactions formanual review, which may cause delays in sending acknowledgements tousers. Moderate delays may be acceptable to organizations sellingphysical goods over the Internet, because for each order there may be atime window during which the ordered physical goods are picked from awarehouse and packaged for shipment, and a manual review may beconducted during that time window. However, many digital interactionsinvolve sale of digital goods (e.g., music, game, etc.), transfer offunds, etc. For such digital interactions, a security system may beexpected to respond to each request in real time, for example, withinhundreds or tens of milliseconds. Such quick responses may improve userexperience. For instance, a user making a transfer or ordering a song,game, etc. may wish to receive real time confirmation that thetransaction has gone through. Accordingly, in some embodiments,techniques are provided for automatically investigating suspiciousdigital interactions, thereby improving response time of a securitysystem.

In some embodiments, if a digital interaction matches one or more fuzzyprofiles, a security system may scrutinize the digital interaction moreclosely, even if there is not yet sufficient information to justifyclassifying the digital interaction as part of an attack. The securitysystem may scrutinize a digital interaction in a non-invasive manner soas to reduce user experience friction.

As an example, a security system may observe an anomalously highpercentage of traffic at a retail web site involving a particularproduct or service, and may so indicate in a fuzzy profile. A digitalinteraction with an attempted purchase of that product or service may beflagged as matching the fuzzy profile, but that pattern alone may not besufficiently suspicious, as many users may purchase that product orservice for legitimate reasons. To prevent a false positive, oneapproach may be to send the flagged digital interaction to a humanoperator for review. Another approach may be to require one or moreverification tasks (e.g., captcha challenge, security question, etc.)before approving the attempted purchase. The inventors have recognizedand appreciated that both of these approaches may negatively impact userexperience.

Accordingly, in some embodiments, a match with a fuzzy profile maytrigger additional analysis that is non-invasive. For example, thesecurity system may collect additional data from the digital interactionin a non-invasive manner and may analyze the data in real time, so thatby the time the digital interaction progresses to a stage with potentialfor damage (e.g., charging a credit card), the security system may havealready determined whether the digital interaction is likely to belegitimate.

In some embodiments, one or more security probes may be deployeddynamically to obtain information from a digital interaction. Forinstance, a security probe may be deploy only when a security systemdetermines that there is sufficient value in doing so (e.g., using anunderstanding of user behavior). As an example, a security probe may bedeployed when a level of suspicion associated with the digitalinteraction is sufficiently high to warrant an investigation (e.g., whenthe digital interaction matches a fuzzy profile comprising one or moreanomalous attributes, or when the digital interaction represents asignificant deviation from an activity pattern observed in the past foran anchor value, such as a device identifier, that is reported in thedigital interaction).

The inventors have recognized and appreciated that by reducing a rate ofdeployment of security probes for surveillance, it may be more difficultfor an attacker to detect the surveillance and/or to discover how thesurveillance is conducted. As a result, the attacker may not be able toevade the surveillance effectively.

In some embodiments, multiple security probes may be deployed, whereeach probe may be designed to discover different information. Forexample, information collected by a probe may be used by a securitysystem to inform the decision of which one or more other probes todeploy next. In this manner, the security system may be able to gain anin-depth understanding into network traffic (e.g., web site and/orapplication traffic). For example, the security system may be able to:classify traffic in ways that facilitate identification of malicioustraffic, define with precision what type of attack is being observed,and/or discover that some suspect behavior is actually legitimate. Insome embodiments, a result may indicate not only a likelihood thatcertain traffic is malicious, but also a likely type of malicioustraffic. Therefore, such a result may be more meaningful than just anumeric score.

The inventors have recognized and appreciated that some online behaviorscoring systems use client-side checks to collect information. In someinstances, such checks are enabled in a client during many interactions,which may give an attacker clear visibility into how the online behaviorscoring system works (e.g., what information is collected, what testsare performed, etc.). As a result, an attacker may be able to adapt andevade detection. Accordingly, in some embodiments, techniques areprovided for obfuscating client-side functionalities. Used alone or incombination with dynamic probe deployment (which may reduce the numberof probes deployed to, for example, one in hundreds of thousands ofdigital interactions), client-side functionality obfuscation may reducethe likelihood of malicious entities detecting surveillance and/ordiscovering how the surveillance is conducted. For instance, client-sidefunctionality obfuscation may make it difficult for a malicious entityto test a probe's behavior in a consistent environment.

IV. Further Descriptions

It should be appreciated that the techniques introduced above anddiscussed in greater detail below may be implemented in any of numerousways, as the techniques are not limited to any particular manner ofimplementation. Examples of details of implementation are providedherein solely for illustrative purposes. Furthermore, the techniquesdisclosed herein may be used individually or in any suitablecombination, as aspects of the present disclosure are not limited to theuse of any particular technique or combination of techniques.

FIG. 1A shows an illustrative system 10 via which digital interactionsmay take place, in accordance with some embodiments. In this example,the system 10 includes user devices 11A-C, online systems 12 and 13, anda security system 14. A user 15 may use the user devices 11A-C to engagein digital interactions. For instance, the user device 11A may be asmart phone and may be used by the user 15 to check email and downloadmusic, the user device 11B may be a tablet computer and may be used bythe user 15 to shop and bank, and the user device 11C may be a laptopcomputer and may be used by the user 15 to watch TV and play games.

It should be appreciated that the user 15 may engage in other types ofdigital interactions in addition to, or instead of, those mentionedabove, as aspects of the present disclosure are not limited to theanalysis of any particular type of digital interactions. Also, digitalinteractions are not limited to interactions that are conducted via anInternet connection. For example, a digital interaction may involve anATM transaction over a leased telephone line.

Furthermore, it should be appreciated that the particular combination ofuser devices 11A-C is provided solely for purposes of illustration, asthe user 15 may use any suitable device or combination of devices toengage in digital interactions, and the user may use different devicesto engage in a same type of digital interactions (e.g., checking email).

In some embodiments, a digital interaction may involve an interactionbetween the user 15 and an online system, such as the online system 12or the online system 13. For instance, the online system 12 may includean application server that hosts a backend of a banking app used by theuser 15, and the online system 13 may include a web server that hosts aretailer's web site that the user 15 visits using a web browser. Itshould be appreciated that the user 15 may interact with other onlinesystems (not shown) in addition to, or instead of the online systems 12and 13. For example, the user 15 may visit a pharmacy's web site to havea prescription filled and delivered, a travel agent's web site to book atrip, a government agency's web site to renew a license, etc.

In some embodiments, behaviors of the user 15 may be measured andanalyzed by the security system 14. For instance, the online systems 12and 13 may report, to the security system 14, behaviors observed fromthe user 15. Additionally, or alternatively, the user devices 11A-C mayreport, to the security system 14, behaviors observed from the user 15.As one example, a web page downloaded from the web site hosted by theonline system 13 may include software (e.g., a JavaScript snippet) thatprograms the browser running on one of the user devices 11A-C to observeand report behaviors of the user 15. Such software may be provided bythe security system 14 and inserted into the web page by the onlinesystem 13. As another example, an application running on one of the userdevices 11A-C may be programmed to observe and report behaviors of theuser 15. The behaviors observed by the application may includeinteractions between the user 15 and the application, and/orinteractions between the user 15 and another application. As anotherexample, an operating system running on one of the user devices 11A-Cmay be programmed to observe and report behaviors of the user 15.

It should be appreciated that software that observes and reportsbehaviors of a user may be written in any suitable language, and may bedelivered to a user device in any suitable manner. For example, thesoftware may be delivered by a firewall (e.g., an application firewall),a network operator (e.g., Comcast, Sprint, etc.), a network accelerator(e.g., Akamai), or any device along a communication path between theuser device and an online system, or between the user device and asecurity system.

Although only one user (i.e., the user 15) is shown in FIG. 1A, itshould be appreciated that the security system 14 may be programmed tomeasure and analyze behaviors of many users across the Internet.Furthermore, it should be appreciated that the security system 14 mayinteract with other online systems (not shown) in addition to, orinstead of the online systems 12 and 13. The inventors have recognizedand appreciated that, by analyzing digital interactions involving manydifferent users and many different online systems, the security system14 may have a more comprehensive and accurate understanding of how theusers behave. However, aspects of the present disclosure are not limitedto the analysis of measurements collected from different online systems,as one or more of the techniques described herein may be used to analyzemeasurements collected from a single online system. Likewise, aspects ofthe present disclosure are not limited to the analysis of measurementscollected from different users, as one or more of the techniquesdescribed herein may be used to analyze measurements collected from asingle user.

FIG. 1B shows an illustrative implementation of the security system 14shown in FIG. 1A, in accordance with some embodiments. In this example,the security system 14 includes one or more frontend systems and/or oneor more backend systems. For instance, the security system 14 mayinclude a frontend system 22 configured to interact with user devices(e.g., the illustrative user device 11C shown in FIG. 1A) and/or onlinesystems (e.g., the illustrative online system 13 shown in FIG. 1A).Additionally, or alternatively, the security system 14 may include abackend system 32 configured to interact with a backend user interface34. In some embodiments, the backend user interface 34 may include agraphical user interface (e.g., a dashboard) for displaying currentobservations and/or historical trends regarding individual users and/orpopulations of users. Such an interface may be delivered in any suitablemanner (e.g., as a web application or a cloud application), and may beused by any suitable party (e.g., security personnel of anorganization).

In the example shown in FIG. 1B, the security system 14 includes a logstorage 24. The log storage 24 may store log files comprising datareceived by the frontend system 22 from user devices (e.g., the userdevice 11C), online systems (e.g., the online system 13), and/or anyother suitable sources. A log file may include any suitable information.For instance, in some embodiments, a log file may include keystrokesand/or mouse clicks recorded from a digital interaction over some lengthof time (e.g., several seconds, several minutes, several hours, etc.).Additionally, or alternatively, a log file may include other informationof interest, such as account identifier, network address, user deviceidentifier, user device characteristics, URL accessed, Stocking KeepingUnit (SKU) of viewed product, etc.

In some embodiments, the log storage 24 may store log files accumulatedover some suitable period of time (e.g., a few years), which may amountto tens of billions, hundreds of billions, or trillions of log files.Each log file may be of any suitable size. For instance, in someembodiments, about 60 kilobytes of data may be captured from a digitalinteraction per minute, so that a log file recording a few minutes ofuser behavior may include a few hundred kilobytes of data, whereas a logfile recording an hour of user behavior may include a few megabytes ofdata. Thus, the log storage 24 may store petabytes of data overall.

The inventors have recognized and appreciated it may be impractical toretrieve and analyze log files from the log storage 24 each time arequest is received to examine a digital interaction for anomaly. Forinstance, the security system 14 may perform expected to respond to arequest to detect anomaly within 100 msec, 80 msec, 60 msec, 40 msec, 20msec, or less. The security system 14 may be unable to identify andanalyze all relevant log files from the log storage 24 within such ashort window of time. Accordingly, in some embodiments, a log processingsystem 26 may be provided to filter, transform, and/or route data fromthe log storage 24 to one or more databases 28.

The log processing system 26 may be implemented in any suitable manner.For instance, in some embodiments, the log processing system 26 mayinclude one or more services configured to retrieve a log file from thelog storage 24, extract useful information from the log file, transformone or more pieces of extracted information (e.g., adding latitude andlongitude coordinates to an extracted address), and/or store theextracted and/or transformed information in one or more appropriatedatabases (e.g., among the one or more databases 28).

In some embodiments, the one or more services may include one or moreservices configured to route data from log files to one or more queues,and/or one or more services configured to process the data in the one ormore queues. For instance, each queue may have a dedicated service forprocessing data in that queue. Any suitable number of instances of theservice may be run, depending on a volume of data to be processed in thequeue.

The one or more databases 28 may be accessed by any suitable componentof the security system 14. As one example, the backend system 32 mayquery the one or more databases 28 to generate displays of currentobservations and/or historical trends regarding individual users and/orpopulations of users. As another example, a data service system 30 mayquery the one or more databases 28 to provide input to the frontendsystem 22.

The inventors have recognized and appreciated that some database queriesmay be time consuming. For instance, if the frontend system 22 were toquery the one or more databases 28 each time a request to detect anomalyis received, the frontend system 22 may be unable to respond to therequest within 100 msec, 80 msec, 60 msec, 40 msec, 20 msec, or less.Accordingly, in some embodiments, the data service system 30 maymaintain one or more data sources separate from the one or moredatabases 28. An example of a data source maintained by the data servicesystem 30 is shown in FIG. 2A and discussed below.

In some embodiments, a data source maintained by the data service system30 may have a bounded size, regardless of how much data is analyzed topopulate the data source. For instance, if there is a burst ofactivities from a certain account, an increased amount of data may bestored in the one or more databases 28 in association with that account.The data service system 30 may process the data stored in the one ormore databases 28 down to a bounded size, so that the frontend system 22may be able to respond to requests in constant time.

Various techniques are described herein for processing incoming data.For instance, in some embodiments, all possible network addresses may bedivided into a certain number of buckets. Statistics may be maintainedon such buckets, rather than individual network addresses. In thismanner, a bounded number of statistics may be analyzed, even if anactual number of network addresses observed may fluctuate over time. Oneor more other techniques may also be used in addition to, or instead ofbucketing, such as maintaining an array of a certain size.

In some embodiments, the data service system 30 may include a pluralityof data services (e.g., implemented using a service-orientedarchitecture). For example, one or more data services may access the oneor more databases 28 periodically (e.g., every hour, every few hours,every day, etc.), and may analyze the accessed data and populate one ormore first data sources used by the frontend system 22. Additionally, oralternatively, one or more data services may receive data from the logprocessing system 26, and may use the received data to update one ormore second data sources used by the frontend system 22. Such a seconddata source may supplement the one or more first data sources withrecent data that has arrived since the last time the one or more firstdata sources were populated using data accessed from the one or moredatabases 28. In various embodiments, the one or more first data sourcesmay be the same as, or different from, the one or more second datasources, or there may be some overlap.

Although details of implementation are shown in FIG. 1B and discussedabove, it should be appreciated that aspects of the present disclosureare not limited to the use of any particular component, or combinationof components, or to any particular arrangement of components.Furthermore, each of the frontend system 22, the log processing system26, the data service system 30, and the backend system 32 may beimplemented in any suitable manner, such as using one or more parallelprocessors operating at a same location or different locations.

FIG. 1C shows an illustrative flow 40 within a digital interaction, inaccordance with some embodiments. In this example, the flow 40 mayrepresent a sequence of activities conducted by a user on a merchant'sweb site. For instance, the user may log into the web site, changebilling address, view a product details page of a first product, view aproduct details page of a second product, add the second product to ashopping cart, and then check out.

In some embodiments, a security system may receive data captured fromthe digital interaction throughout the flow 40. For instance, thesecurity system may receive log files from a user device and/or anonline system involved in the digital interaction (e.g., as shown inFIG. 1B and discussed above).

The security system may use the data captured from the digitalinteraction in any suitable manner. For instance, as shown in FIG. 1B,the security system may process the captured data and populate one ormore databases (e.g., the one or more illustrative databases 28 shown inFIG. 1B). Additionally, or alternatively, the security system maypopulate one or more data sources adapted for efficient access. Forinstance, the security system may maintain current interaction data 42in a suitable data structure (e.g., the illustrative data structure 220shown in FIG. 2B). As one example, the security system may keep track ofdifferent network addresses observed at different points in the flow 40(e.g., logging in and changing billing address via a first networkaddress, viewing the first and second products via a second networkaddress, and adding the second product to the cart and checking out viaa third network address). As another example, the security system maykeep track of different credit card numbers used in the digitalinteraction (e.g., different credit cards being entered in successionduring checkout). The data structure may be maintained in any suitablemanner (e.g., using the illustrative process 230 shown in FIG. 2C) andby any suitable component of the security system (e.g., the illustrativefrontend system 22 and/or the illustrative data service system 30).

In some embodiments, the security system may maintain historical data44, in addition to, or instead of the current interaction data 42. Insome embodiments, the historical data 44 may include log entries foruser activities observed during one or more prior digital interactions.Additionally, or alternatively, the historical data 44 may include oneor more profiles associated respectively with one or more anchor values(e.g., a profile associated with a particular device identifier, aprofile associated with a particular network address, etc.). However, itshould be appreciated that aspects of the present disclosure are notlimited to the use of any particular type of historical data, or to anyhistorical data at all. Moreover, any historical data used may be storedin any suitable manner.

In some embodiments, the security system may maintain population data46, in addition to, or instead of the current interaction data 42 and/orthe historical data 44. For instance, the security system may update, inreal time, statistics such as breakdown of web site traffic by useragent, geographical location, product SKU, etc. As one example, thesecurity system may use a hash-modding method to divide all knownbrowser types into a certain number of buckets (e.g., 10 buckets, 100buckets, etc.). For each bucket, the security system may calculate apercentage of overall web site traffic that falls within that bucket. Asanother example, the security system may use a hash-modding method todivide all known product SKUs into a certain number of buckets (e.g., 10buckets, 100 buckets) and calculate respective traffic percentages.Additionally, or alternatively, the security system may calculaterespective traffic percentages for combinations of buckets (e.g., acombination of a bucket of browser types, a bucket of product SKUs,etc.).

In some embodiments, a security system may perform anomaly detectionprocessing on an on-going basis and may continually create new fuzzyprofiles and/or update existing fuzzy profiles. For instance, thesecurity system may compare a certain statistic (e.g., a count ofdigital interactions reporting Chrome as browser type) from a currenttime period (e.g., 9:00 pm-10:00 pm today) against the same statisticfrom a past time period (e.g., 9:00 pm-10:00 pm yesterday, a week ago, amonth ago, a year ago, etc.). If the current value of the statisticdeviates significantly from the past value of the statistic (e.g., bymore than a selected threshold amount), an anomaly may be reported, andthe corresponding attribute (e.g., browser type) and attribute value(e.g., Chrome) may be stored in a fuzzy profile.

In some embodiments, the security system may render any one or moreaspects of the current interaction data 42, the historical data 44,and/or the population data 46 (e.g., via the illustrative backend userinterface 34 shown in FIG. 1B). For instance, the security system mayrender breakdown of web site traffic (e.g., with actual trafficmeasurements, or percentages of overall traffic) using a stacked areachart.

FIG. 1C also shows examples of time measurements in the illustrativeflow 40. In some embodiments, the security system may receive datacaptured throughout the flow 40, and the received data may include logentries for user activities such as logging into the web site, changingbilling address, viewing the product details page of the first product,viewing the product details page of the second product, adding thesecond product to the shopping cart, checking out, etc. The log entriesmay include timestamps, which may be used by the security system todetermine an amount of time that elapsed between two points in thedigital interaction. For instance, the security system may use theappropriate timestamps to determine how much time elapsed betweenviewing the second product and adding the second product to the shoppingcart, between adding the second product to the shopping cart andchecking out, between viewing the second product to checking out, etc.

The inventors have recognized and appreciated that certain timingpatterns may be indicative of illegitimate digital interactions. Forinstance, a reseller may use bots to make multiple purchases of aproduct that is on sale, thereby circumventing a quantity restriction(e.g., one per customer) imposed by a retail web site. Such a bot may beprogrammed to step through an order quickly, to maximize the totalnumber of orders completed during a promotional period. The resultingtiming pattern may be noticeably different from that of a human customerbrowsing through the web site and taking time to read product detailsbefore making a purchase decision. Therefore, a timing pattern such as adelay between product view and checkout may be a useful attribute tomonitor in digital interactions.

It should be appreciated that aspects of the present disclosure are notlimited to the analysis of online purchases, as one or more of thetechniques described herein may be used to analyze other types ofdigital interactions, including, but not limited to, opening a newaccount, checking email, transferring money, etc. Furthermore, it shouldbe appreciated that aspects of the present disclosure are not limited tomonitoring any particular timing attribute, or any timing attribute atall. In some embodiments, other attributes, such as various anchor typesobserved from a digital interaction, may be monitored in addition to, orinstead of, timing attributes.

FIG. 2A shows an illustrative data structure 200 for recordingobservations from a digital interaction, in accordance with someembodiments. For instance, the data structure 200 may be used by asecurity system (e.g., the illustrative security system 14 shown in FIG.1A) to record distinct anchor values of a same type that have beenobserved in a certain context. However, that is not required, as in someembodiments the data structure 200 may be used to record other distinctvalues, instead of, or in addition to, anchor values.

In some embodiments, the data structure 200 may be used to store up to Ndistinct anchor values of a same type (e.g., N distinct credit cardnumbers) that have been seen in a digital interaction. For instance, insome embodiments, the data structure 200 may include an array 205 of acertain size N. Once the array has been filled, a suitable method may beused to determine whether to discard a newly observed credit cardnumber, or replace one of the stored credit card numbers with the newlyobserved credit card number. In this manner, only a bounded amount ofdata may be analyzed in response to a query, regardless of an amount ofraw data that has been received.

In some embodiments, the number N of distinct values may be chosen toprovide sufficient information without using an excessive amount ofstorage space. For instance, a security system may store more distinctvalues (e.g., 8-16) if precise values are useful for detectinganomalies, and fewer distinct values (e.g., 2-4) if precise values areless important. In some embodiments, N may be 8-16 for networkaddresses, 4-8 for credit card numbers, and 2-4 for user agents. Thesecurity system may use the network addresses to determine if there is alegitimate reason for multiple network addresses being observed (e.g., auser traveling and connecting to a sequence of access points along theway), whereas the security system may only look for a simple indicationthat multiple user agents have been observed.

It should be appreciated that aspects of the present disclosure are notlimited to the use of an array to store distinct values. Other datastructures, such as linked list, tree, etc., may also be used.

The inventors have recognized and appreciated that it may be desirableto store additional information in the data structure 200, beyond Ndistinct observed values. For instance, it may be desirable to store anindication of how many distinct values have been observed overall, andhow such values are distributed. Accordingly, in some embodiments,possible values may be divided into a plurality of M buckets, and a bitstring 210 of length M may be stored in addition to, or instead of, Ndistinct observed values. Each bit in the bit string 210 may correspondto a respective bucket, and may be initialized to 0. Whenever a value ina bucket is observed, the bit corresponding to that bucket may be set to1.

Possible values may be divided into buckets in any suitable manner. Forinstance, in some embodiments, a hash function may be applied topossible values and a modulo operation (with modulus M) may be appliedto divide the resulting hashes into M buckets. The modulus M may bechosen to achieve a desired balance between precision and efficiency.For instance, a larger number of buckets may provide a higher resolution(e.g., fewer possible values being lumped together and becomingindistinguishable), but the bit string 210 may take up more storagespace, and it may be computationally more complex to update and/oraccess the bit string 210.

It should be appreciated that aspects of the present disclosure are notlimited to the use of hash-modding to divide possible values intobuckets, as other methods may also be suitable. For instance, in someembodiments, one or more techniques based on Bloom filters may be used.

FIG. 2B shows an illustrative data structure 220 for recordingobservations from a digital interaction, in accordance with someembodiments. For instance, the data structure 220 may be used by asecurity system (e.g., the illustrative security system 14 shown in FIG.1A) to record distinct anchor values that have been observed in acertain context. However, that is not required, as in some embodimentsthe data structure 220 may be used to record other distinct values,instead of, or in addition to, anchor values.

In the example shown in FIG. 2B, the data structure 220 may be indexedby a session identifier and a flow identifier. The session identifiermay be an identifier assigned by a web server for a web session. Theflow identifier may identifier a flow (e.g., the illustrative flow 40shown in FIG. 1C), which may include a sequence of activities. Thesecurity system may use the session and flow identifiers to match adetected activity to the digital interaction. However, it should beappreciated that aspects of the present disclosure are not limited tothe use of a session identifier and a flow identifier to identify adigital interaction.

In some embodiments, the data structure 220 may include a plurality ofcomponents, such as components 222, 224, 226, and 228 shown in FIG. 2B.Each of the components 222, 224, 226, and 228 may be similar to theillustrative data structure 200 shown in FIG. 2A. For instance, thecomponent 222 may store up to a certain number of distinct networkaddresses observed from the digital interaction, the component 224 maystore up to a certain number of distinct user agents observed from thedigital interaction, the component 226 may store up to a certain numberof distinct credit card numbers observed from the digital interaction,etc.

In some embodiments, the data structure 220 may include a relativelysmall number (e.g., 10, 20, 30, etc.) of components such as 222, 224,226, and 228. In this manner, a relatively small amount of data may bestored for each on-going digital interaction, while still allowing asecurity system to conduct an effective sameness analysis.

In some embodiments, the component 228 may store a list of lists ofindices, where each list of indices may correspond to an activity thattook place in the digital interaction. For instance, with reference tothe illustrative flow 40 shown in FIG. 1C, a first list of indices maycorrespond to logging in, a second list of indices may corresponding tochanging billing address, a third list of indices may correspond toviewing the first product, a fourth list of indices may correspond toviewing the second product, a fifth list of indices may correspond toadding the second product to the shopping cart, and a sixth list ofindices may correspond to checking out.

In some embodiments, each list of indices may indicate anchor valuesobserved from the corresponding activity. For instance, a list [1, 3, 2,. . . ] may indicate the first network address stored in the component222, the third user agent stored in the component 224, the second creditcard stored in the component 226, etc. This may provide a compactrepresentation of the anchor values observed from each activity.

In some embodiments, if an anchor value stored in a component isreplaced by another anchor value, one or more lists of indices includingthe anchor value being replaced may be updated. For instance, if thefirst network address stored in the component 222 is replaced by anothernetwork address, the list [1, 3, 2, . . . ] may be updated as [φ, 3, 2,. . . ], where φ is any suitable default value (e.g., N+1, where N isthe capacity of the component 222).

In some embodiments, a security system may use a list of lists ofindices to determine how frequently an anchor value has been observed.For instance, the security system may count a number of lists in whichthe index 1 appears at the first position. This may indicate a number oftimes the first network address stored in the component 222 has beenobserved.

It should be appreciated that the components 222, 224, 226, and 228shown in FIG. 2B and discussed above solely for purposes ofillustration, as aspects of the present disclosure are not limited tostoring any particular information about a current digital interaction,or to any particular way of representing the stored information. Forinstance, other types of component data structures may be used inaddition to, or instead of, the illustrative data structure 200 shown inFIG. 2A.

FIG. 2C shows an illustrative process 230 for recording observationsfrom a digital interaction, in accordance with some embodiments. Forinstance, the process 230 may be performed by a security system (e.g.,the illustrative security system 14 shown in FIG. 1A) to record distinctvalues of a same type (e.g., N distinct credit card numbers) that havebeen observed in a certain context (e.g., in a certain digitalinteraction). The distinct values may be recorded in a data structuresuch as the illustrative data structure 200 shown in FIG. 2A.

At act 231, the security system may identify an anchor value X in acertain context. For instance, in some embodiments, the anchor value Xmay be observed from a certain digital interaction. In some embodiments,the security system may access a record of the digital interaction, andmay identify from the record a data structure associated with a type Tof the anchor value X. For instance, if the anchor value X is a creditcard number, the security system may identify, from the record of thedigital interaction, a data structure for storing credit card numbersobserved from the digital interaction.

At act 232, the security system may identify a bucket B to which theanchor value X belongs. For instance, in some embodiments, ahash-modding operation may be performed to map the anchor value X to thebucket B as described above in connection with FIG. 2A.

At act 233, the security system may store an indication that at leastone anchor value from the bucket B has been observed in connection withthe digital interaction. For instance, the security system may operateon the data structure identified at act 231. With reference with theexample shown in FIG. 2A, the security system may identify, in theillustrative bit string 210, a position that corresponds to the bucket Bidentified at act 232 and write 1 into that position.

At act 234, the security system may determine whether the anchor value Xhas already been stored in connection with the relevant context. Forinstance, the security system may check if the anchor value X hasalready been stored in the data structure identified at act 231. Withreference to the example shown in FIG. 2A, the security system may lookup the anchor value X in the illustrative array 205. This lookup may beperformed in any suitable manner. For instance, if the array 205 issorted, the security system may perform a binary search to determine ifthe anchor value X is already stored in the array 205.

If it is determined at act 234 that the anchor value X has already beenstored, the process 230 may end. Although not shown, the security systemmay, in some embodiments, increment one or more counters for the anchorvalue X prior to ending the process 230.

If it is determined at act 234 that the anchor value X has not alreadybeen stored, the security system may proceed to act 235 to determinewhether to store the anchor value X. With reference to the example shownin FIG. 2A, the security system may, in some embodiments, store theanchor value X if the array 205 is not yet full. If the array 205 isfull, the security system may determine whether to replace one of thestored anchor values with the anchor value X.

As one example, the security system may store in the array 205 the firstN distinct anchor values of the type T observed from the digitalinteraction, and may discard every subsequently observed anchor value ofthe type T. As another example, the security system may replace theoldest stored anchor value with the newly observed anchor value, so thatthe array 205 stores the last N distinct values of the type T observedin the digital interaction. As another example, the security system maystore in the array 205 a suitable combination of N anchor values of thetype T, such as one or more anchor values observed near a beginning ofthe digital interaction, one or more anchor values most recentlyobserved from the digital interaction, one or more anchor values mostfrequently observed from the digital interaction (e.g., based onrespective counters stored for anchor values, or lists of indices suchas the illustrative component 228 shown in FIG. 2B), and/or one or moreother anchor values of interest (e.g., one or more credit card numberspreviously involved in credit card cycling attacks).

FIG. 3 shows illustrative attributes that may be monitored by a securitysystem, in accordance with some embodiments. In this example, a securitysystem (e.g., the illustrative security system 14 shown in FIG. 1B)monitors a plurality of digital interactions, such as digitalinteractions 301, 302, 303, etc. These digital interactions may takeplace via a same web site. However, that is not required, as one or moreof the techniques described herein may be used to analyze digitalinteractions taking place across multiple web sites.

In the example shown in FIG. 3, the security system monitors differenttypes of attributes. For instance, the security system may record one ormore anchor values for each digital interaction, such as network address(attribute 311), email address (attribute 312), account identifier(attribute 313), etc.

The security system may identify an anchor value from a digitalinteraction in any suitable matter. As one example, the digitalinteraction may include an attempt to log in, and an email address maybe submitted to identify an account associated with the email address.However, that is not required, as in some embodiments a separate accountidentifier may be submitted and an email address on record for thataccount may be identified. As another example, the digital interactionmay include an online purchase. A phone number may be submitted forscheduling a delivery, and a credit card number may be submitted forbilling. However, that is not required, as in some embodiments a phonenumber and/or a credit card number may be identified from a record ofthe account from which the online purchase is made. As another example,the security system may examine data packets received in connection withthe digital interaction and extract, from the data packets, informationsuch as a source network address and a source device identifier.

It should be appreciated that the examples described above are merelyillustrative, as aspects of the present disclosure are not limited tothe use of any particular anchor type, or any particular method foridentifying an anchor value. Examples of anchor types include, but arenot limited to the following.

-   -   User information        -   account identifier        -   real name, social security number, driver's license number,            passport number, etc.        -   email address            -   user name, country of user registration, date of user                registration, etc.            -   email domain, DNS, server            -   status/type/availability/capabilities/software/etc.,                network details, domain registrar and associated details                (e.g., country of domain registrant, contact information                of domain registrant, etc.), age of domain, country of                domain registration, etc.        -   phone number            -   subscriber number, country prefix, country of number,                area code, state/province/parish/etc. of area code or                number location, if the number is activated, if the                number is forwarded, billing type (e.g. premium rate),                ownership details (e.g., personal, business, and                associated details regarding email, domain, network                address, etc.), hardware changes, etc.        -   location            -   GPS coordinates, continent, country, territory, state,                province, parish, city, time zone, designated market                area, metropolitan statistical area, postal code, street                name, street number, apartment number, address type                (e.g., billing, shipping, home, etc.), etc.        -   payment            -   plain text or hash of number of credit card, payment                card, debit card, bank card, etc., card type, primary                account number (PAN), issuer identification number                (IIN), IIN details (e.g., name, address, etc.), date of                issue, date of expiration, etc.    -   Device information        -   brand, model, operating system, user agent, installed            components, rendering artifacts, browser capabilities,            installed software, available features, available external            hardware (e.g. displays, keyboards, network and available            associated data), etc.        -   device identifier, cookie/HTML storage, other device-based            storage, secure password storage (e.g., iOS Keychain), etc.        -   device fingerprint (e.g., from network and environment            characteristics)    -   Network information        -   network address (e.g., IP address, sub address, etc.),            network identifier, network access identifier, mobile            station equipment identity (IMEI), media access control            address (MAC), subscriber identity module (SIM), etc.        -   IP routing type (e.g. fixed connection, aol, pop, superpop,            satellite, cache proxy, international proxy, regional proxy,            mobile gateway, etc.), proxy type (e.g., anonymous,            distorting, elite/concealing, transparent, http, service            provider, socks/socks http, web, etc.), connection type            (e.g., anonymized, VPN, Tor, etc.), network speed, network            operator, autonomous system number (ASN), carrier,            registering organization of network address, organization            NAICS code, organization ISIC code, if the organization is a            hosting facility, etc.

Returning to FIG. 3, the security system may monitor one or moretransaction attributes in addition to, or instead of, one or more anchortypes. The security system may identify transaction attribute valuesfrom a digital interaction in any suitable matter. As one example, thedigital interaction may include a purchase transaction, and the securitysystem may identify information relating to the purchase transaction,such as the a SKU for a product being purchased (attribute 321), a countof items in a shopping cart at time of checkout (attribute 322), anaverage value of items being purchased (attribute 323), etc.

Alternatively, or additionally, the security system may monitor one ormore timing attributes, such as time from product view to checking out(attribute 331), time from adding a product to cart to checking out(attribute 332), etc. Illustrative techniques for identifying timingattribute values are discussed in connection with FIG. 2.

It should be appreciated that the attributes shown in FIG. 3 anddiscussed above are provided solely for purposes of illustration, asaspects of the present disclosure are not limited to the use of anyparticular attribute or combination of attributes. For instance, in someembodiments, a digital interaction may include a transfer of funds,instead of, or in addition to, a purchase transaction. Examples oftransaction attributes for a transfer of funds include, but are notlimited to, amount being transferred, name of recipient institution,recipient account number, etc.

FIG. 4 shows an illustrative process 400 for detecting anomalies, inaccordance with some embodiments. For instance, the process 400 may beperformed by a security system (e.g., the illustrative security system14 shown in FIG. 1B) to monitor digital interactions taking place at aparticular web site. The security system may compare what is currentlyobserved against what was observed previously at the same web site todetermine whether there is any anomaly.

At act 405, the security system may identify a plurality of values of anattribute. As discussed in connection with FIG. 3, the security systemmay monitor any suitable attribute, such as an anchor type (e.g.,network address, email address, account identifier, etc.), a transactionattribute (e.g., product SKU, number of items in shopping cart, averagevalue of items purchased, etc.), a timing attribute (e.g., time fromproduct view to checkout, time from adding product to shopping cart tocheckout, etc.), etc.

In some embodiments, the security system may identify each value of theattribute from a respective digital interaction. For instance, thesecurity system may monitor digital interactions taking place within acurrent time period (e.g., 30 minutes, one hour, 90 minutes, two hours,etc.), and may identify a value of the attribute from each digitalinteraction. However, it should be appreciated that aspects of thepresent disclosure are not limited to monitoring every digitalinteraction taking place within some time period. For instance, in someembodiments, digital interactions may be sampled (e.g., randomly) andattribute values may be identified from the sampled digitalinteractions.

The inventors have recognized and appreciated that it may be impracticalto maintain statistics on individual attribute values. For instance,there may be billions of possible network addresses. It may beimpractical to maintain a counter for each possible network address tokeep track of how many digital interactions are reporting thatparticular network address. Accordingly, in some embodiments, possiblevalues of an attribute may be divided into a plurality of buckets.Rather than maintaining a counter for each attribute value, the securitysystem may maintain a counter for each bucket of attribute values. Forinstance, a counter may keep track of a number of digital interactionswith any network address from a bucket B of network addresses, asopposed to a number of digital interactions with a particular networkaddress Y. Thus, multiple counters (e.g., a separate counter for eachattribute value in the bucket B) may be replaced with a single counter(e.g., an aggregate counter for all attribute values in the bucket B).

In this manner, a desired balance between precision and efficiency maybe achieved by selecting an appropriate number of buckets. For instance,a larger number of buckets may provide a higher resolution, but morecounters may be maintained and updated, whereas a smaller number ofbuckets may reduce storage requirement and speed up retrieval andupdates, but more information may be lost.

Returning to the example of FIG. 4, the security system may, at act 410,divide the attribute values identified at act 405 into a plurality ofbuckets. In some embodiments, each bucket may be a multiset. Forinstance, if two different digital interactions report the same networkaddress, that network address may appear twice in the correspondingbucket.

At act 415, the security system may determine a count of the values thatfall within a particular bucket. In some embodiments, a count may bedetermined for each bucket of the plurality of buckets. However, that isnot required, as in some embodiments the security system may only keeptrack of one or more buckets of interest.

Various techniques may be used to divide attribute values into buckets.As one example, the security system may divide numerical attributevalues (e.g., time measurements) into a plurality of ranges. As anotherexample, the security system may use a hash-modding technique to dividenumerical and/or non-numerical attribute values into buckets. Othertechniques may also be used, as aspects of the present disclosure arenot limited to any particular technique for dividing attribute valuesinto buckets.

FIG. 5 shows an illustrative technique for dividing a plurality ofnumerical attribute values into a plurality of ranges, in accordancewith some embodiments. For instance, the illustrative technique shown inFIG. 5 may be used by a security system to divide values of theillustrative attribute 331 (time from product view to checkout) shown inFIG. 3 into a plurality of buckets.

In the example shown in FIG. 5, the plurality of buckets include threebuckets corresponding respectively to three ranges of time measurements.For instance, bucket 581 may correspond to a range between 0 and 10seconds, bucket 582 may correspond to a range between 10 and 30 seconds,and bucket 583 may correspond to a range of greater than 30 seconds.

In some embodiments, thresholds for dividing numeric measurements intobuckets may be chosen based on observations from population data. Forinstance, the inventors have recognized and appreciated that the timefrom product view to checkout is rarely less than 10 seconds in alegitimate digital interaction, and therefore a high count for thebucket 581 may be a good indicator of an anomaly. In some embodiments,buckets may be defined based on a population mean and a populationstandard deviation. For instance, there may be a first bucket for valuesthat are within one standard deviation of the mean, a second bucket forvalues that are between one and two standard deviations away from themean, a third bucket for values that are between two and three standarddeviations away from the mean, and a fourth bucket for values that aremore than three standard deviations away from the mean. However, itshould be appreciated that aspects of the present disclosure are notlimited to the use of population mean and population standard deviationto define buckets. For instance, in some embodiments, a bucket may bedefined based on observations from known fraudsters, and/or a bucket maybe defined based on observations from known legitimate users.

In some embodiments, the security system may identify a plurality ofvalues of the attribute 331 from a plurality of digital interactions.For instance, the security system may identify from each digitalinteraction an amount of time that elapsed between viewing a productdetails page for a product and checking out (e.g., as discuss inconnection with FIGS. 1C and 3). In the example shown in FIG. 5, ninedigital interactions are monitored, and nine values of the attribute 331(time from product view to checkout) are obtained. It should beappreciated that aspects of the present disclosure are not limited tomonitoring any particular number of digital interactions. For instance,in some embodiments, some or all of the digital interactions takingplace during a certain period of time may be monitored, and the numberof digital interactions may fluctuate depending on a traffic volume atone or more relevant web sites.

In the example shown in FIG. 5, the nine values are divided into thebuckets 581-583 based on the corresponding ranges, resulting in fourvalues (i.e., 10 seconds, 1 second, 2 seconds, and 2 seconds) in thebucket 581, three values (i.e., 25 seconds, 15 seconds, and 30 seconds)in the bucket 582, and two values (i.e., 45 seconds and 90 seconds) inthe bucket 583. In this manner, numerical data collected by the securitysystem may be quantized to reduce a number of possible values for aparticular attribute, for example, from thousands or more of possiblevalues (3600 seconds, assuming time is recorded up to one hour) to threepossible values (three ranges). This may allow the security system toanalyze the collected data more efficiently. However, it should beappreciated that aspects of the present disclosure are not limited tothe use of any particular quantization technique, or any quantizationtechnique at all.

FIG. 7A shows an illustrative histogram 700 representing a distributionof numerical attribute values among a plurality of buckets, inaccordance with some embodiments. For instance, the histogram 700 mayrepresent a result of dividing a plurality of time attribute values intoa plurality of ranges, as discussed in connection with act 415 of FIG.4. The time attribute values may be values of the illustrative attribute331 (time from product view to checkout) shown in FIG. 3.

In the example of FIG. 7A, the histogram 700 includes a plurality ofbars, where each bar may correspond to a bucket, and each bucket maycorrespond to a range of time attribute values. The height of each barmay represent a count of values that fall into the corresponding bucket.For instance, the count for the second bucket (between 1 and 5 minutes)may higher than the count for the first bucket (between 0 and 1 minute),while the count for the third bucket (between 5 and 15 minutes) may bethe highest, indicating that a delay between product view and checkoutmost frequently falls between 5 and 15 minutes.

In some embodiments, a number M of buckets may be selected to provide anappropriate resolution to analyze measured values for an attribute,while managing storage requirement. For instance, more buckets mayprovide higher resolution, but more counters may be stored. Moreover,the buckets may correspond to ranges of uniform length, or variablelengths. For instance, in some embodiments, smaller ranges may be usedwhere attribute values tend to cluster (e.g., smaller ranges below 15minutes), and/or larger ranges may be used where attribute values tendto be sparsely distributed (e.g., larger ranges above 15 minutes). As anexample, if a bucket has too many values (e.g., above a selectedthreshold number), the bucket may divided into two or more smallerbuckets. As another example, if a bucket has too few values (e.g., belowa selected threshold number), the bucket may be merged with one or moreadjacent buckets. In this manner, useful information about distributionof the attribute values may be made available, without storing too manycounters.

FIG. 6 shows an illustrative hash-modding technique for dividingnumerical and/or non-numerical attribute values into buckets, inaccordance with some embodiments. For instance, the illustrativetechnique shown in FIG. 6 may be used by a security system to dividevalues of the illustrative attribute 311 (IP addresses) shown in FIG. 3into a plurality of buckets.

In some embodiments, a hash-modding technique may involve hashing aninput value and performing a modulo operation on the resulting hashvalue. In the example is shown in FIG. 6, nine digital interactions aremonitored, and nine values of the attribute 311 (IP address) areobtained. These nine IP addresses may be hashed to produce nine hashvalues, respectively. The following values may result from extractingtwo least significant digits from each hash value: 93, 93, 41, 41, 9a,9a, 9a, 9a, 9a. This extraction process may be equivalent to performinga modulo operation (i.e., mod 256) on the hash values.

In some embodiments, each residue of the modulo operation may correspondto a bucket of attribute values. For instance, in the example shown inFIG. 6, the residues 93, 41, and 9a correspond, respectively, to buckets681-683. As a result, there may be two attribute values in each of thebucket 681 and the bucket 682, and five attribute values in the bucket683.

FIG. 7B shows an illustrative histogram 720 representing a distributionof attribute values among a plurality of buckets, in accordance withsome embodiments. For instance, the histogram 720 may represent a resultof dividing a plurality of attribute values into a plurality of buckets,as discussed in connection with act 415 of FIG. 4. The attribute valuesmay be values of the illustrative attribute 311 (IP addresses) shown inFIG. 3. Each attribute value may be converted into a hash value, and amodulo operation may be applied to map each hash value to a residue, asdiscussed in connection with FIG. 6.

In the example of FIG. 7B, the histogram 720 includes a plurality ofbars, where each bar may correspond to a bucket, and each bucket maycorrespond to a residue of the modulo operation. The height of each barmay represent a count of values that fall into the corresponding bucket.For instance, the count for the third bucket (residue “02”) is higherthan the count for the first bucket (residue “00”) and the count for thesecond bucket (residue “01”), indicating that one or more IP addressesthat hash-mod to “02” are frequently observed.

In some embodiments, a modulus M of the modulo operation (whichdetermines how many buckets there are) may be selected to provide anappropriate resolution to analyze measured values for an attribute,while managing storage requirement. For instance, more buckets mayprovide higher resolution, but more counters may be stored. Moreover, insome embodiments, buckets may be further divided and/or merged. As oneexample, if a bucket has too many values (e.g., above a selectedthreshold number), the bucket may divided into smaller buckets. Forinstance, the bucket for hash values ending in “00” may be divided into16 buckets for hash values ending, respectively, in “000,” “100,” . . ., “f00”), or into two buckets, the first for hash values ending in“000,” “100,” . . . , or “700, ” the second for hash values ending in“800,” “900,” . . . , or “f00.” As another example, if a bucket has toofew values (e.g., below a selected threshold number), the bucket may bemerged with one or more other buckets. In this manner, usefulinformation about distribution of the attribute values may be madeavailable, without storing too many counters.

Returning to the example of FIG. 4, the security system may, at act 420,compare the count determined in act 415 against historical information.In some embodiments, the historical information may include an expectedcount for the same bucket, and the security system may compare the countdetermined in act 415 against the expected count.

The determination at act 415 and the comparison at act 420 may beperformed for any number of one or more buckets. For instance, in someembodiments, a histogram obtained at act 415 (e.g., the illustrativehistogram 720 shown in FIG. 7B) may be compared against an expectedhistogram obtained from historical information.

FIG. 8A shows an illustrative expected histogram 820 representing adistribution of attribute values among a plurality of buckets, inaccordance with some embodiments. The expected histogram 820 may becalculated in similar manner as the illustrative histogram 720 of FIG.7B, except that attribute values used to calculate the expectedhistogram 820 may be obtained from a plurality of past digitalinteractions, such as digital interactions from a past period of timeduring which there is no known attack (or no known large-scale attack)on one or more relevant web site. Thus, the expected histogram 820 mayrepresent an acceptable pattern.

FIG. 8B shows a comparison between the illustrative histogram 720 ofFIG. 7B and the illustrative expected histogram 820 of FIG. 8A, inaccordance with some embodiments. FIG. 9 shows illustrative time periods902 and 904, in accordance with some embodiments. For instance,attribute values that are used to calculate the illustrative histogram720 of FIG. 7B may be obtained from digital interactions taking placeduring the time period 902, whereas attribute values that are used tocalculate the illustrative expected histogram 820 of FIG. 8A may beobtained from digital interactions taking place during the time period904. In some embodiments, the security system may perform anomalydetection processing on a rolling basis. Whenever anomaly detectionprocessing is performed, the time period 902 may be near a current time,whereas the time period 904 may be in the past.

In some embodiments, the time periods 902 and 904 may have a same length(e.g., 30 minutes, one hour, 90 minutes, 2 hours, etc.), and/or at asame time of day, so that the comparison between the histogram 720 andthe expected histogram 820 may be more meaningful. In some embodiments,multiple comparisons may be made using different expected histograms,such as expected histograms for a time period of a same length from anhour ago, two hours ago, etc., and/or a same time period from a day ago,a week ago, a month ago, a year ago, etc. For instance, if a significantdeviation is detected between the histogram 720 and an expectedhistogram (e.g., a day ago), the security system may compare thehistogram 720 against an expected histogram that is further back in time(e.g., a week ago, a month ago, a year ago, etc.). This may allow thesecurity system to take into account cyclical patterns (e.g., highertraffic volume on Saturdays, before Christmas, etc.)

Returning to the example of FIG. 4, the security system may, at act 425,determine if the there is any anomaly associated with the attribute inquestion (e.g., time from product view to checkout, IP address, etc.).For instance, with reference to FIG. 8A, the third bar (residue “02”) inthe histogram 720 may exceed the third bar in the expected histogram 820by a significant amount (e.g., more than a selected threshold amount).Thus, the security system may infer a possible attack from an IP addressthat hash-mods into “02.” The security system may store the attribute(e.g., IP address) and the particular bucket exhibiting an anomaly(e.g., residue “02”) in a fuzzy profile. As discussed in connection withFIG. 14, incoming digital interactions may be analyzed against the fuzzyprofile, and one or more security measures may be imposed on matchingdigital interactions (e.g., digital interactions involving IP addressesthat hash-mod into “02”). For example, one or more security probes maybe deployed to investigate the matching digital interactions.

The inventors have recognized and appreciated that the illustrativetechniques discussed in connection with FIG. 4 may provide flexibilityin anomaly detection. For instance, the expected histogram 820 may becustomized for a web site, by using only digital interactions takingplace on that web site. Moreover, expected histograms may evolve overtime. For instance, on any given day, the security system may usedigital interactions from the day before (or a week ago, a month ago, ayear ago, etc.) to calculate an expected histogram. In this manner,expected histograms may follow trends on the web site and remainup-to-date.

The inventors have recognized and appreciated that the illustrativetechniques discussed in connection with FIG. 4 may facilitate detectionof unknown anomalies. As one example, an unexpected increase in trafficfrom a few IP addresses may be an indication of coordinated attack fromcomputer resources controlled by an attacker. As another example, anunexpected spike in a product SKU being ordered from a web site may bean indication of a potential pricing mistake and resellers orderinglarge quantities for that particular product SKU. A security system thatmerely looks for known anomalous patterns may not be able to detect suchemergent anomalies.

Although details of implementation are shown in FIGS. 4-9 and discussedabove, it should be appreciated that aspects of the present disclosureare not limited to such details. For instance, in some embodiments, thesecurity system may compute a normalized count for a bucket, which maybe a ratio between a count for the individual bucket and a total countamong all buckets. The normalized count may then be compared against anexpected normalized count, in addition to, or instead of, comparing thecount against an expected count as described in connection with FIG. 8B.

The inventors have recognized and appreciated that normalization may beused advantageously to reduce false positives. For instance, duringtraditional holiday shopping seasons, or during an advertised salesspecial, there may be an increase of shopping web site visits andcheckout activities. Such an increase may lead to an increase ofabsolute counts across multiple buckets. A comparison between a currentabsolute count for an individual bucket and an expected absolute count(e.g., an absolute count for that bucket observed a week ago) may showthat the current absolute count exceeds the expected absolute count bymore than a threshold amount, which may lead to a false positiveidentification of anomaly. By contrast, a comparison between a currentnormalized count and an expected normalized count may remain reliabledespite an across-the-board increase in activities.

FIG. 10 shows an illustrative normalized histogram 1000, in accordancewith some embodiments. In this example, each bar in the histogram 1000corresponds to a bucket, and a height of the bar corresponds to anormalized count obtained by dividing an absolute count for the bucketby a sum of counts from all buckets. For instance, the first bucket mayaccount for 10% of all digital interactions, the second bucket 15%, thethird bucket 30%, the fourth bucket 15%, etc.

In some embodiments, a normalized histogram may be used at acts 415-420of the illustrative process 400 of FIG. 4, instead of, or in additionto, a histogram with absolute counts. For instance, with increased salesactivities during a holiday shopping season, an absolute count in abucket may increase significantly from a week or a month ago, but anormalized count may remain roughly the same. If, on the other hand, anattack is taking place via digital interactions originating from a smallnumber of IP addresses, a bucket to which one or more of the maliciousIP addresses are mapped (e.g., via hash-modding) may account for anincreased percentage of all digital interactions.

The inventors have recognized and appreciated that it may be beneficialto examine how histograms for an attribute evolve over time. Forinstance, more digital interactions may be expected from a certain timezone during daytime for that time zone, and a deviation from thatpattern may indicate an anomaly. Accordingly, in some embodiments, anarray of histograms may be built, where each histogram may correspond toa separate window of time.

FIG. 11 shows an illustrative array 1100 of histograms over time, inaccordance with some embodiments. In this example, the array 1100includes 24 histograms, each corresponding to a one-hour window. Forinstance, there may be a histogram for a current time, a histogram forone hour prior, a histogram for two hours prior, etc. These histogramsmay show statistics for a same attribute, such as IP address.

In the example shown in FIG. 11, there are four buckets for theattribute. For instance, the attribute may be IP address, and an IPaddress may be mapped to one of the four buckets based on a time zoneassociated with the IP address. For instance, buckets 1120, 1140, 1160,and 1180 may correspond, respectively, to Eastern, Central, Mountain,and Pacific.

The illustrative array 1100 shows peak activity levels in the bucket1120 at hour markers −18, −19, and −20, which may be morning hours forthe Eastern time zone. The illustrative array 1100 also shows peakactivity levels in the bucket 1160 at hours markers −16, −17, and −19,which may be morning hours for the Mountain time zone. These may beconsidered normal patterns. Although not shown, a pike of activities atnighttime may indicate an anomaly.

Although a particular time resolution (i.e., 24 one-hour windows) isused in the example of FIG. 11, it should be appreciated that aspects ofthe present disclosure are not limited to any particular timeresolution. One or more other time resolutions may be used additionally,or alternatively, such as 12 five-minute windows, seven one-day windows,14 one-day windows, four one-week windows, etc. Furthermore, aspects ofthe present disclosure are not limited to the use of an array ofhistograms.

The inventors have recognized and appreciated that digital interactionsassociated with an attack may exhibit anomaly in multiple attributes.Accordingly, in some embodiments, a profile may be generated with aplurality of attributes to increase accuracy and/or efficiency ofanomaly detection. For instance, a plurality of attributes may bemonitored, and the illustrative process 400 of FIG. 4 may be performedfor each attribute to determine if that attribute is anomalous (e.g., bybuilding a histogram, or an array of histograms as discussed inconnection with FIG. 11). In this manner, risk assessment may beperformed in multiple dimensions, which may improve accuracy.

In some embodiments, one or more attributes may be selected so that adetected anomaly in any of the one or more attributes may be highlyindicative of an attack. However, the inventors have recognized andappreciated that, while anomalies in some attributes may be highlyindicative of attacks, such anomalies may rarely occur, so that it maynot be worthwhile to expend time and resources (e.g., storage, processorcycles, etc.) to monitor those attributes. Accordingly, in someembodiments, an attribute may be selected only if anomalies in thatattribute are observed frequently in known attacks (e.g., in higher thana selected threshold percentage of attacks).

The inventors have further recognized and appreciated that anomalies inone attribute may be correlated with anomalies in another attribute. Forinstance, there may be a strong correlation between time zone andlanguage, so that an observation of an anomalous time zone value may notprovide a lot of additional information if a corresponding languagevalue is already known to be anomalous, or vice versa. Accordingly, insome embodiments, the plurality of attributes may be selected to bepairwise independent.

FIG. 12 shows an illustrative profile 1200 with multiple anomalousattributes, in accordance with some embodiments. In this example, theillustrative profile includes at least three attributes—time fromproduct view to checkout, email domain, and product SKU. Threeillustrative histograms 1220, 1240, and 1260 may be built for theseattributes, respectively. For instance, each of the histograms 1220,1240, and 1260 may be built based on recent digital interactions at arelevant web site, using one or more of the techniques described inconnection with FIGS. 4-7B.

In the example of FIG. 12, the histograms 1220, 1240, and 1260 arecompared against three expected histograms, respectively. In someembodiments, an expected histogram may be calculated based on historicaldata. As one example, each bar in an expected histogram may becalculated as a moving average over some length of time. As anotherexample, an expected histogram may be a histogram calculated fromdigital interactions that took place in a past period of time, forinstance, as discussed in connection with FIGS. 8A-9.

In the example of FIG. 12, each of the histograms 1220, 1240, and 1260has an anomalous value. For instance, the third bucket for the histogram1220 may show a count 1223 that is substantially higher (e.g., more thana threshold amount higher) than an expected count 1226 in thecorresponding expected histogram, the fourth bucket for the histogram1240 may show a count 1244 that is substantially higher (e.g., more thana threshold amount higher) than an expected count 1248 in thecorresponding expected histogram, and the last bucket for the histogram1260 shows a count 1266 that is substantially higher (e.g., more than athreshold amount higher) than an expected count 1272 in thecorresponding expected histogram. In some embodiments, differentthresholds may be used to determine anomaly for different attributes, assome attributes may have counts that tend to fluctuate widely over time,while other attributes may have counts that tend to stay relativelystable.

Although a particular combination of attributes is shown in FIG. 12 anddescribed above, it should be appreciated that aspects of the presentdisclosure are not so limited. Any suitable one or more attributes maybe used in a fuzzy profile for anomaly detection.

The inventors have recognized and appreciated that when information iscollected from a digital interaction, not all of the collectedinformation may be useful for anomaly detection. For instance, if aparticular operating system has a certain vulnerability that isexploited in an attack, and the vulnerability exists in all versions ofthe operating system, a stronger anomalous pattern may emerge if alldigital interactions involving that operating system are analyzedtogether, regardless of version number. If, by contrast, digitalinteractions are stratified by version number, each version number maydeviate from a respective expected pattern only moderately, which maymake the attack more difficult to detect.

Accordingly, in some embodiments, an entropy reduction operation may beperformed on an observation from a digital interaction to removeinformation that may not be relevant for assessing a level of riskassociated with the digital interaction. In this manner, lessinformation may be processed, which may reduce storage requirementand/or improve response time of a security system.

FIG. 13 shows an illustrative process 1300 for detecting anomalies, inaccordance with some embodiments. Like the illustrative process 400 ofFIG. 4, the process 1300 may be performed by a security system (e.g.,the illustrative security system 14 shown in FIG. 1B) to monitor digitalinteractions taking place at a particular web site. The security systemmay compare what is currently observed against what was observedpreviously at the same web site to determine whether there is anyanomaly.

At act 1305, the security system may record a plurality of observationsrelating to an attribute. As discussed in connection with FIG. 3, thesecurity system may monitor any suitable attribute, such as an anchortype (e.g., network address, email address, account identifier, etc.), atransaction attribute (e.g., product SKU, number of items in shoppingcart, average value of items purchased, etc.), a timing attribute (e.g.,time from product view to checkout, time from adding product to shoppingcart to checkout, etc.), etc.

In some embodiments, the security system may record each observationfrom a respective digital interaction. Instead of dividing theobservations into a plurality of buckets, the security system may, atact 1308, perform an entropy reduction operation on each observation,thereby deriving a plurality of attribute values. The plurality ofattribute values are then divided into buckets, for instance, asdiscussed in connection with act 410 of FIG. 4. The remainder of theprocess 1300 may proceed as described in connection with FIG. 4.

As one example of entropy reduction, two observations relating to useragent may be recorded as follows:

-   -   Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4)        AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116        Safari/537.36    -   Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6)        AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116        Safari/537.36

The inventors have recognized and appreciated that the operating systemMac OS X may often be associated with attacks, regardless of versionnumber (e.g., 10_11_6 versus 10_11_4).

If hash-modded directly, the above strings may land in two differentbuckets. As a result, an increase in traffic (e.g., 1000 digitalinteractions per hour) involving the above strings may be split betweenthe two buckets, where each bucket may show a smaller increase (e.g.,about 500 digital interactions per hour), and the security system maynot be sufficiently confident to flag an anomaly.

Accordingly, in some embodiments, the security system may strip theoperating system version numbers from the above strings at act 1308 ofthe illustrative process 1300 of FIG. 13. Additionally, oralternatively, the Mozilla version numbers “5.0” may be reduced to “5,”the AppleWebKit version numbers “537.36” may be reduced to “537,” theChrome version numbers “52.0.2743.116 ” may be reduced to “52,” and theSafari version numbers “537.36” may be reduced to “537.” As a result,both of the above strings may be reduced to a common attribute value:

-   -   mozilla5macintoshintelmacosx10applewebkit537khtmllikegeckochrome52safari537

In this manner, digital interactions involving the two original stringsmay be aggregated into one bucket, which may accentuate anomalies andfacilitate detection.

In some embodiments, entropy reduction may be performed incrementally.For instance, the security system may first strip out operating versionnumbers. If no discernible anomaly emerges, the security system maystrip out AppleWebKit version numbers. This may continue until somediscernible anomaly emerges, or all version numbers have been strippedout.

As another example of entropy reduction, an observation relating todisplay size may be recorded as follows:

-   -   1024×768, 1440×900

There may be two sets of display dimensions because a computer used forthe digital interaction may have two displays. In some embodiments, thesecurity system may sort the display dimensions in some appropriateorder (e.g., low to high, or high to low), which may result in thefollowing:

-   -   768, 900, 1024, 1440

The inventors have recognized and appreciated that sorting may allowpartial matching. However, it should be appreciated that aspects of thepresent disclosure are not limited to sorting display dimensions.

In some embodiments, the security system may reduce the displaydimensions, for example, by dividing the display dimensions by 100 andthen rounding (e.g., using a floor or ceiling function). This may resultin the following:

-   -   8, 9, 10, 14

Thus, small differences in display dimensions may be removed. Suchdifferences may occur due to changes in window sizes. For example, aheight of a task bar may change, or the task bar may only be presentsometimes. Such changes may be considered unimportant for anomalydetection.

Although the inventors have recognized and appreciated variousadvantages of entropy reduction, it should be appreciated that aspectsof the present disclosure are not limited to any particular entropyreduction technique, or to the use of entropy reduction at all.

FIG. 14 shows an illustrative process 1400 for matching a digitalinteraction to a fuzzy profile, in accordance with some embodiments. Forinstance, the process 1400 may be performed by a security system (e.g.,the illustrative security system 14 shown in FIG. 1B) to determine if adigital interaction is likely part of an attack.

In the example shown in FIG. 14, the fuzzy profile is built (e.g., usingthe illustrative process 400 shown in FIG. 4) for detecting illegitimateresellers. For instance, the profile may store one or more attributesthat are anomalous. Additionally, or alternatively, the profile maystore, for each anomalous attribute, an attribute value that isanomalous, and/or an indication of an extent to which that attributevalue deviates from expectation.

In some embodiments, an anomalous attribute may be product SKU, and ananomalous attribute may be a particular hash-mod bucket (e.g., the lastbucket in the illustrative histogram 1260 shown in FIG. 12). The profilemay store an indication of an extent to which an observed count for thatbucket (e.g., the count 1266) deviates from an expected count (e.g., thecount 1272). As one example, the profile may store a percentage by whichthe observed count exceeds the expected count. As another example, theprofile may store an amount by which the observed count exceeds theexpected count. As another example, the profile may store an indicationof a distance between the observed count and the expected count. Forinstance, the expected count may be an average count for the particularbucket over some period of time, and the expected interval may bedefined based on a standard deviation (e.g., one standard deviation awayfrom the average count, two standard deviations away, two standarddeviations away, etc.).

Returning to FIG. 14, the security system may, at act 1405, identify aplurality of attributes from the fuzzy profile. In some embodiments,digital interactions with a retailer's web store may be analyzed todistinguish possible resellers from retail customers who purchase goodsfor their own use. A reseller profile for use in such an analysis maycontain attributes such as the following.

-   -   Product SKU    -   Email Domain of purchaser    -   Browser Type    -   Web Session Interaction Time

At act 1410, the security system may select an anomalous attribute(e.g., product SKU) and identify one or more values that are anomalous(e.g., one or more hash-mod buckets with anomalously high counts). Atact 1415, the security system may determine if the digital interactionthat is being analyzed matches the fuzzy profile with respect to theanomalous attribute. For instance, the security system may identify ahash-mod bucket for a product SKU that is being purchased in the digitalinteraction, and determine whether that hash-mod bucket is among one ormore anomalous hash-mod buckets stored in the profile for the productSKU attribute. If there is a match, the security system may so record.

At act 1420, the security system may determine if there is anotheranomalous attribute to be processed. If so, the security system mayreturn to act 1410. Otherwise, the security system may proceed to act1425 to calculate a penalty score. The penalty score may be calculatedin any suitable manner. In some embodiments, the penalty score isdetermined on a ratio between a count of anomalous attributes withrespect to which the digital interaction matches the profile, and atotal count of anomalous attributes. Illustrative code for calculatingthe penalty score is shown below.

PENALTY_MIN = 100 PENALTY_MAX = 500 PENALTY = 0 PARAMETERS =array(sku_histograms, domain_histograms, browser_histograms,time_histograms) NUM_MATCH = 0 FOREACH PARAMETERS as PARAM  IFisAnomalous(PARAM)NUM_MATCH++ // Minimum threshold for anomaly (e.g.,two out of four match) may be set in any suitable way // If thresholdexceeded, linear interpolation between MIN and MAX if (NUM_MATCH >= 2) RATIO = NUM MATCH / COUNT(PARAMETERS)  PENALTY = ((PENALTY_MAX −PENALTY_MIN) * RATIO) +  PENALTY_MIN  triggerSignal(“RESELLER”, PENALTY)END

Additionally, or alternatively, an attribute penalty score may bedetermined for a matching attribute based on an extent to which anobserved count for a matching bucket deviates from an expected count forthat bucket. An overall penalty score may then be calculated based onone or more attribute penalty scores (e.g., as a weighted sum).

In this example, a penalty score calculated using a reseller profile mayindicate a likelihood that a reseller is involved in a digitalinteraction. Such a penalty score may be used in any suitable manner.For instance, the web retailer may use the penalty score to decidewhether to initiate one or more actions, such as canceling an orderalready placed by the suspected reseller, suspend the suspectedreseller's account, and/or prevent creation of a new account by anentity linked to the suspected reseller's account.

It should be appreciated the reseller profile described above inconnection with FIG. 14 is provided merely for purposes of illustration.Aspects of the present disclosure are not limited to monitoring anyparticular attribute or combination of attributes to identify resellers,nor to the use of a reseller profile at all. In various embodiments, anysuitable attribute may be monitored to detect any type of anomaly, inaddition to, or instead of, reseller activity.

In some embodiments, one or more past digital interactions may beidentified, using any suitable method, as part of an attack. Each suchdigital interaction may be associated with an anchor value (e.g., IPaddress, name, account ID, email address, device ID, device fingerprint,user ID, hashed credit card number, etc.), and the anchor value may inturn be associated with a behavior profile. Thus, one or more behaviorprofiles may be identified as being associated with the attack and maybe used to build a fuzzy profile.

In some embodiments, a fuzzy profile may include any suitablecombination of one or more attributes, which may, although need not,coincide with one or more attributes of the behavior profiles from whichthe fuzzy profile is built. For instance, the fuzzy profile may store arange or limit of values for an attribute, where the range or limit maybe determined based on values of the attribute stored in the behaviorprofiles.

FIG. 15 shows an illustrative fuzzy profile 1500, in accordance withsome embodiments. In this example, three individual behaviors A, B, andC are observed in known malicious digital interactions. For instance,each of the behaviors A, B, and C may be observed in 20% of knownmalicious digital interactions (although it should be appreciated thatbehaviors observed at different frequencies may also be analyzedtogether).

The inventors have recognized and appreciated that although each of thebehaviors A, B, and C, individually, may be a poor indicator of whethera digital interaction exhibiting that behavior is part of an attack,certain combinations of the behaviors A, B, and C may provide morereliable indicators. For example, if a digital interaction exhibits bothbehaviors A and B, there may be a high likelihood (e.g., 80%) that thedigital interaction is part of an attack, whereas if a digitalinteraction exhibits both behaviors B and C, there may be a lowlikelihood (e.g., 40%) that the digital interaction is part of anattack. Thus, if a digital interaction exhibits behavior B, that thedigital interaction also exhibits behavior A may greatly increase thelikelihood that the digital interaction is part of an attack, whereasthat the digital interaction also exhibits behavior C may increase thatlikelihood to a lesser extent. It should be appreciated that specificpercentages are provided in the example of FIG. 1 merely for purposes ofillustration, as other percentages may also be possible.

FIG. 16 shows an illustrative fuzzy profile 1600, in accordance withsome embodiments. In this example, the fuzzy profile 1600 includes sixindividual behaviors A, B, C, X, Y, and Z, where behaviors A, B, and Ceach include an observed historical pattern, and behaviors X, Y, and Zeach include a behavior observed during a current digital interaction.If a digital interaction is associated with an anchor value (e.g., IPaddress, account ID, etc.) exhibiting both historical patterns A and B,there may be a high likelihood (e.g., 80%) that the digital interactionis part of an attack. As discussed above in connection with FIG. 13,such a likelihood may be determined based on a percentage of maliciousdigital interactions that are also associated with an anchor valueexhibiting both historical patterns A and B.

If a digital interaction is associated with an anchor value (e.g., IPaddress, account ID, etc.) exhibiting historical patterns C, and if bothbehaviors X and Y are observed during the current digital interaction,there may be an even higher likelihood (e.g., 98%) that the digitalinteraction is part of an attack. If, on the other hand, only behaviorsY and Z are observed during the current digital interaction, there maybe a lower likelihood (e.g., 75%) likelihood that the digitalinteraction is part of an attack.

In some embodiments, one or more behaviors observed in a new digitalinteraction may be checked against a fuzzy profile and a score may becomputed that is indicative of a likelihood that the new digitalinteraction is part of an attack associated with the fuzzy profile. Inthis manner, an anchor value associated with the new digital interactionmay be linked to a known malicious anchor value associated with thefuzzy profile.

The inventors have recognized and appreciated that the use of fuzzyprofiles to link anchor values may be advantageous. For instance, fuzzyprofiles may capture behavior characteristics that may be more difficultfor an attacker to spoof, compared to other types of characteristicssuch as device characteristics. Moreover, in some embodiments, a fuzzyprofile may be used across multiple web sites and/or applications. Forexample, when an attack occurs against a particular web site orapplication, a fuzzy profile may be created based on that attack (e.g.,to identify linked anchor values) and may be used to detect similarattacks on a different web site or application. However, it should beappreciated that aspects of the present disclosure are not limited tothe use of a fuzzy profile, as each of the techniques described hereinmay be used alone, or in combination with any one or more othertechniques described herein.

Some retailers use Stock Keeping Units (SKUs) or other types ofidentifiers to identify products and/or services sold. This may allowanalysis of sales data by product/service, for example, to identifyhistorical purchase trends. In some embodiments, techniques are providedfor identifying unexpected sale patterns. Although SKUs are used in someof the examples described herein, it should be appreciated that aspectsof the present disclosure are not limited to the use of SKUs, as othertypes of identifiers for products and/or services may also be used.

The inventors have recognized and appreciated that a SKU may sometimesbecome incorrectly priced in a retailer's inventory management software.This may be the result of a glitch or bug in the software, or a humanerror. As one example, a product that normally sells for $1,200.00 maybe incorrectly priced at $120.00, which may lead to a sharp increase inthe number of purchases of that product. In an automated retailenvironment, such as e-commerce, the retailer may inadvertently allowtransactions to complete and ship goods at a loss. Examples of otherproblems that may lead to anomalous sales data include, but are notlimited to, consumers exploiting unexpected coupon code interactions,consumers violating sale policies (e.g., limit one item per customer ata discounted price), and commercial resellers attempting to takeadvantage of consumer-only pricing.

Accordingly, in some embodiments, techniques are provided for detectingunexpected sale patterns and notifying retailers so that any underlyingproblems may be corrected. For example, a security system may beprogrammed to monitor purchase activity (e.g., per SKU or group of SKUs)and raise an alert when significant deviation from an expected baselineis observed. The security system may use any suitable technique fordetecting unexpected sale patterns, including, but not limited, using afuzzy profile as described herein.

The inventors have recognized and appreciated that some systems onlyanalyze historical sales data (e.g., sale patterns for previous month oryear). As a result, retailers may not be able to discover issues such asthose discussed above until the damage has been done (e.g., goodsshipped and transactions closed). Accordingly, in some embodiments,techniques are provided for analyzing sales data and alerting retailersin real time (e.g., before sending confirmations to consumers, whenpayments are still being processed, before goods are shipped, beforegoods are received by consumers, before transactions are marked closed,etc.).

In some embodiments, one or more automated countermeasures may beimplemented in response to an alert. For example, a retailer mayautomatically freeze sales transactions that are in progress, and/orremove a SKU from the website, until an investigation is conducted.Additionally, or alternatively, one or more recommendations may be madeto a retailer (e.g., based on profit/loss calculations), so that theretailer may decide to allow or block certain activities depending onprojected financial impact.

In some embodiments, data relating to sales activities may be collectedand stored in a database. One or more metrics may then be derived fromthe stored data. Examples of metrics that may be computed for aparticular SKU or group of SKUs include, but are not limited to,proportion of transactions including that SKU or group of SKUs (e.g.,out of all transactions at a website or group of websites), averagenumber of items of that SKU or group of SKUs purchased in a singletransaction or over a certain period of time by a single buyer, etc.

In some embodiments, one or more metrics derived from current salesactivities may be compared against historical data. For example,JavaScript code running on a website may monitor one or more salesactivities and compare one or more current metrics against historicaldata. However, it should be appreciated that aspects of the presentdisclosure are not limited to the use of JavaScript, as any suitableclient-side and/or server-side programs written in any suitable languagemay be used to implement any one or more of the functionalitiesdescribed herein.

In some embodiments, an alert may be raised if one or more currentmetrics represent a significant deviation from one or more historicallyobserved baselines. The one or more metrics may be derived in anysuitable manner. For instance, in some embodiments, a metric may pertainto all transaction conducted over a web site or group of web sites, ormay be specific to a certain anchor value such as a certain IP addressor a certain user account. Additionally, or alternatively, a metric maybe per SKU or group of SKUs.

As a non-limiting example, an electronics retailer may sell a particularmodel of television for $1,200.00. Historical sales data may indicateone or more of the following:

-   -   a small percentage (e.g., 1% of) transactions site-wide include        this particular model of television;    -   a large percentage (e.g., 99%) of transactions including this        model of television include only one television;    -   on average the retailer sells a moderate number (e.g., 30) of        televisions of this model per month;    -   an average value of transactions including this model of        television is $1,600.00 (or some other value close to the price        of this model of television);    -   sales of this model of television spike during one or more        specific time periods, such as on or around Black Friday or        Boxing Day;    -   sales of this model of television drop during summer months;    -   etc.

In some embodiments, a system may be provided that is programmed to usehistorical data (e.g., one or more of the observations noted above) as abaseline to intelligently detect notable deviations. For instance, withreference to the television example described above, if the retailer'sstock keeping system incorrectly priced the $1,200.00 model oftelevision at $120.00, one or more of the following may be observed:

-   -   the proportion of transactions site-wide including this        particular model of television increases sharply (e.g., from 1%        to 4%);    -   transactions including this model of television suddenly start        including multiple televisions;    -   the retailer has sold more televisions in the last 24 hours than        the retailer does typically in one month;    -   the average value of transactions including this model of        television drops substantially;    -   etc.

In some embodiments, alerts may be triggered based on observations suchas those described above. As one example, any one of a designated set ofobservations may trigger an alert. As another example, a thresholdnumber of observations (e.g., two, three, etc.) from a designated set ofobservations may trigger an alert. As yet another example, one or morespecific combinations of observations may trigger an alert.

In some embodiments, when an alert is raised, a retailer may be notifiedin real time. In this manner, the retailer may be able to investigateand correct one or more errors that led to the anomalous salesactivities, before significant damage is done to the retailer'sbusiness.

Although an example is described above relating to mispriced items, itshould be appreciated that the techniques described herein may be usedin other scenarios as well. For example, one or more of the techniquesdescribed herein may be used to detect abuse of sale prices, newcustomer loss-leader deals, programming errors relating to certaincoupon codes, resellers buying out stock, etc. Any of these and/or otheranomalies may be detected from a population of transactions.

In some embodiments, an online behavior scoring system may calculate arisk score for an anchor value, where the anchor value may be associatedwith an entity such as a human user or a bot. The risk score mayindicate a perceived likelihood that the associated entity is malicious(e.g., being part of an attack). Risk scores may be calculated using anysuitable combination of one or more techniques, including, but notlimited to:

-   -   Analyzing traffic volumes over one or more dimensions such as        IP, UID stored in a cookie, device fingerprint, etc.        Observations may be compared against a baseline, which may be        derived from one or more legitimate samples.    -   Analyzing historical access patterns. For example, a system may        detect a new user ID and device association (e.g., a user        logging in from a newly purchased mobile phone). The system may        observe a rate at which requests associated with the user ID are        received from the new device, and may compare the newly observed        rate against a rate at which requests associated with the user        ID were received from a previous device. Additionally, or        alternatively, the system may observe whether requests are        distributed in a similar manner throughout different times of        day (e.g., whether more or fewer requests are received at a        certain time of day).    -   Checking reputation of origins, for example, using honeypots, IP        blacklists, and/or TOR lists.    -   Using one-time-use tokens to detect replays of old        communication.    -   Altering forms to detect GUI replay or screen macro agents, for        example, by adding or removing fields, altering x/y coordinates        of fields, etc.

The inventors have recognized and appreciated that a sophisticatedattacker may be able to detect when some of the above-describedtechniques are deployed, and to react accordingly to avoid appearingsuspicious. Accordingly, in some embodiments, techniques are providedfor monitoring online behavior in a manner that is transparent toentities being monitored.

In some embodiments, one or more security probes may be deployeddynamically to obtain information regarding an entity. For instance, asecurity probe may be deploy only when a security system determines thatthere is sufficient value in doing so (e.g., using an understanding ofuser behavior). As an example, a security probe may be deployed when alevel of suspicion associated with the entity is sufficiently high towarrant an investigation (e.g., when recent activities of an entityrepresent a significant deviation from an activity pattern observed inthe past for that entity). The inventors have recognized and appreciatedthat by reducing a rate of deployment of security probes forsurveillance, it may be more difficult for an attacker to detect thesurveillance and/or to discover how the surveillance is conducted. As aresult, the attacker may not be able to evade the surveillanceeffectively.

FIG. 17 shows an illustrative process 1700 for dynamic security probedeployment, in accordance with some embodiments. For instance, theprocess 1700 may be performed by a security system (e.g., theillustrative security system 14 shown in FIG. 1B) to determine if andwhen to deploy one or more security probes.

At act 1705, the security system may receive data regarding a digitalinteraction. For instance, as discussed in connection with FIG. 1B, thesecurity system may receive log files comprising data recorded fromdigital interactions. The security system may process the received dataand store salient information into an appropriate data structure, suchas the illustrative data structure 220 shown in FIG. 2B. The storedinformation may be used, at act 1710, to determine if the digitalinteraction is suspicious.

Any suitable technique may be used to determine if the digitalinteraction is suspicious. For instance, the illustrative process 1400shown in FIG. 14 may be used to determine if the digital interactionmatches a fuzzy profile that stores anomalous attributes. If a resultingpenalty score is below a selected threshold, the security system mayproceed to act 1715 to perform standard operation. Otherwise, thesecurity system may proceed to act 1720 to deploy a security probe, anddata collected by the security probe from the digital interaction may beanalyzed at act 1725 to determine if further action is appropriate.

The penalty score threshold may be chosen in any suitable manner. Forinstance, the inventors have recognized and appreciated that, while itmay be desirable to collect more data from digital interactions, thesecurity system may have limited resources such as network bandwidth andprocessing power. Therefore, to conserve resources, security probesshould be deployed judiciously. Moreover, the inventors have recognizedand appreciated that frequent deployment of probes may allow an attackerto study the probes and learn how to evade detection. Accordingly, insome embodiments, a penalty score threshold may be selected to provide adesired tradeoff.

It should be appreciated that aspects of the present disclosure are notlimited to the use of a fuzzy profile to determine if and when to deploya security probe. Additionally, or alternatively, a profile associatedwith an anchor value observed from the digital interaction may be usedto determine if the digital interaction is sufficiently similar to priordigital interactions from which the anchor value was observed. If it isdetermined that the digital interaction is not sufficiently similar toprior digital interactions from which the anchor value was observed, oneor more security probes may be deployed to gather additional informationfrom the digital interaction.

In some embodiments, a security system may be configured to segmenttraffic over one or more dimensions, including, but not limited to, IPAddress, XFF IP Address, C-Class IP Address, Input Signature, AccountID, Device ID, User Agent, etc. For instance, each digital interactionmay be associated with one or more anchor values, where each anchorvalue may correspond to a dimension for segmentation. This may allow thesecurity system to create segmented lists. As one example, a segmentedlist may be created that includes all traffic reporting Chrome as theuser agent. Additionally, or alternatively, a segmented list may becreated that includes all traffic reporting Chrome Version 36.0.1985.125as the user agent. In this manner, segmented lists may be created at anysuitable granularity. As another example, a segmented list may includeall traffic reporting Mac OS X 10.9.2 as the operating system.Additionally, or alternatively, a segmented list may be created thatincludes all traffic reporting Chrome Version 36.0.1985.125 as the useragent and Mac OS X 10.9.2 as the operating system. In this manner,segmented lists may be created with any suitable combination of one ormore anchor values.

In some embodiments, one or more metrics may be collected and stored fora segmented list. For instance, a segmented list (e.g., all trafficassociated with a particular IP address or block of IP addresses) may beassociated with a segment identifier, and one or more metrics collectedfor that segmented list may be stored in association with the segmentidentifier. Examples of metrics that may be collected include, but arenot limited to, average risk score, minimum risk score, maximum riskscore, number of accesses with some window of time (e.g., the last 5minutes, 10 minutes, 15 minutes, 30 minutes, 45 minutes, hour, 2 hours,3 hours, 6 hours, 12 hours, 24 hours, day, 2 days, 3 days, 7 days, twoweeks, etc.), geographic data, etc.

In some embodiments, a security system may use one or more metricsstored for a segmented list to determine whether a security probe shouldbe deployed. For example, a security probe may be deployed when one ormore metrics exceed corresponding thresholds. The security system mayselect one or more probes based on a number of different factors, suchas which one or more metrics have exceeded the corresponding thresholds,by how much the one or more metrics have exceeded the correspondingthresholds, and/or which segmented list is implicated.

Thresholds for metrics may be determined in any suitable manner. Forinstance, in some embodiments, one or more human analysts may examinehistorical data (e.g., general population data, data relating to trafficthat turned out to be associated with an attack, data relating totraffic that was not identified as being associated with an attack,etc.), and may select the thresholds based on the historical data (e.g.,to achieve a desire tradeoff between false positive errors and falsenegative errors). Additionally, or alternatively, one or more techniquesdescribed below in connection with threshold-type sensors may be used toselect thresholds automatically.

The inventors have recognized and appreciated that some online behaviorscoring systems use client-side checks to collect information. In someinstances, such checks are enabled in a client during many interactions,which may give an attacker clear visibility into how the online behaviorscoring system works (e.g., what information is collected, what testsare performed, etc.). As a result, an attacker may be able to adapt andevade detection. Accordingly, in some embodiments, techniques areprovided for obfuscating client-side functionalities. Used alone or incombination with dynamic probe deployment (which may reduce a number ofprobes deployed to, for example, one in hundreds of thousands ofinteractions), client-side functionality obfuscation may reduce alikelihood of malicious entities detecting surveillance and/ordiscovering how the surveillance is conducted. For instance, client-sidefunctionality obfuscation may make it difficult for a malicious entityto test a probe's behavior in a consistent environment.

FIG. 18 shows an illustrative cycle 1800 for updating one or moresegmented lists, in accordance with some embodiments. In this example,one or more handlers may be programmed to read from a segmented list(e.g., by reading one or more metrics associated with the segmentedlist) and determine whether and/or how a probe should be deployed.Examples of handlers include, but are not limited to, an initializationhandler programmed to handle initialization requests and return HTMLcode, and/or an Ajax (asynchronous JavaScript and XML) handlerprogrammed to respond to Ajax requests. Additionally, or alternatively,one or more handlers (e.g., a score handler programmed to calculate riskscores) may be programmed to write to a segmented list (e.g., byupdating one or more metrics associated with the segmented list, such asaverage, minimum, and/or maximum risk scores). However, aspects of thepresent disclosure are not limited to the use of handlers, as othertypes of programs may also be used to implement any of thefunctionalities described herein.

In some embodiments, a write to a segmented list may trigger one or morereads from the segmented list. For example, whenever a score handlerupdates a risk score metric, a cycle may be started and aninitialization handler and/or Ajax handler may read one or moresegmented lists affected by the update. In this manner, whenever a newevent takes place that affects a metric, a fresh determination may bemade as to whether to deploy one or more probes. However, aspects of thepresent disclosure are not limited to the implementation of such cycles,as in some embodiments a segmented list may be read periodically,regardless of observations from new events.

In some embodiments, a probe may be deployed to one or more selectedinteractions only, as opposed to all interactions in a segmented list.For example, a probe may be deployed only to one or more suspectedmembers in a segmented list (e.g., a member for which one or moremeasurements are at or above certain alert levels). Once a result isreceived from the probe, the result may be stored in association withthe member and/or the segmented list, and the probe may not be sentagain. In this manner, a probe may be deployed only a limited number oftimes, which may make it difficult for an attacker to detect whatinformation the probe is collecting, or even the fact that a probe hasbeen deployed. However, it should be appreciated that aspects of thepresent disclosure are not limited to such targeted deployment ofprobes, as in some embodiments a probe may be deployed to everyinteraction, or one or more probes may be deployed in a targetedfashion, while one or more other probes may be deployed to everyinteraction.

In some embodiments, a probe may use markup (e.g., image tag) alreadypresent on a web page to perform one or more functions. For example, anymarkup that requires a user agent to perform a computational action maybe used as a probe. Additionally, or alternatively, a probe may includeadditional markup, JavaScript, and/or Ajax calls to a server. Somenon-limiting examples of probes are described below.

-   -   IsRealJavaScript        -   One or more JavaScript statements to perform a function may            be included on a web page, where a result of executing the            function is to be sent back to a server. If the result is            not received or is received but not correct, it may be            determined that the client is not running a real JavaScript            engine.    -   IsRunningHeadlessBrowser        -   A widget may be programmed to request graphics card            information, viewport information, and/or window information            (e.g. window.innerHeight, document.body.clientWidth, etc.).            Additionally, or alternatively, the widget may be programmed            to watch for mouse movement inside a form. If one or more            results are missing or anomalous, it may be determined that            the client is running a headless browser.    -   IsCookieEnabled        -   One or more cookies with selected names and values may be            set in a user's browser, where the one or more values are to            be sent back to a server. If the one or more values are not            received, it may be determined that the browser is not            allowing cookies.    -   IsUASpoofing        -   One or more JavaScript statements that behave in a certain            recognizable manner for a purported browser type and/or            version may be included. If the expected anomalous behavior            is not seen, it may be determined that the user agent is            being spoofed.    -   IsDevicelDSpoofing        -   The inventors have recognized and appreciated that a device            ID may be a dynamic combination of certain elements (e.g.,            relating to browser and/or hardware characteristics). A            formula for deriving the device ID may be altered during a            probe (e.g., increasing/decreasing length, and/or            adding/omitting one or more elements). If the newly derived            device ID is not as expected, it may be determined that the            device ID is being spoofed.    -   IsReadinglDs        -   One or more values may be modified one or more times during            a digital interaction. For example, one or more system form            IDs may be modified before being delivered as HTML, and            again after associated JavaScript code loads. Depending on            which version of the one or more IDs is obtained by an            attacker, a security system may deduce when in a transaction            cycle the attacker is reading the one or more IDs.    -   IsFabricatinglnputBehavior        -   Software code for a widget may be randomly modified to use            different symbols for key and/or mouse events. If one or            more symbols do not match, it may be determined that the            input data has been fabricated.    -   IsReferencingSystemJS        -   One or more system JavaScript functions may be duplicated            and hidden, and one or more alarms may be added to one or            more original system functions. If an alarm is triggered, it            may be determined that a third party is invoking a system            function.    -   IsReplayingGUlMouseEvents        -   An Ajax response may be altered to include a Document Object            Model (DOM) manipulation instruction to manipulate a GUI            field or object. As one example, the DOM manipulation            instruction may move a GUI field (e.g., a required field            such as a submit button) to a different location in a GUI,            and place an invisible yet fully functional field (e.g.,            another submit button) at the original location. If a form            is submitted using the invisible field, it may be determined            that the GUI events are a result of a replay or macro. As            another example, the GUI field may be moved, but there may            be no replacement field at the original location. If a            “click” event nonetheless occurs at the original location,            it may be determined that the GUI events are a result of a            replay or macro. As yet another example, the DOM            manipulation instruction may replace a first GUI field            (e.g., a “Submit” button) with a second GUI field of the            same type (e.g., a “Submitl” button). A human user            completing the form legitimately may click the second GUI            field, which is visible. A bot completing the form using a            replay script may instead “click” the first GUI field, which            is invisible.    -   IsReplayingGUlKeyEvents        -   Similar to IsReplayingGUIMouseEvents, this probe may hide a            text input field, and place a differently named field at the            original location. If the invisible field receives a key            event, it may be determined that the event is a replay.        -   IsReplayingRecordedAjaxCalls        -   This probe may change an endpoint address of an Ajax call.            If an old address is used for an Ajax call, it may be            determined that the Ajax call is a replay.    -   IsAssumingAjaxBehavior        -   This probe may instruct a client to make multiple Ajax calls            and/or one or more delayed Ajax calls. If an unexpected Ajax            behavior pattern is observed, it may be determined that an            attacker is fabricating an Ajax behavior pattern.

Although several examples of probes are discussed above, it should beappreciated that aspects of the present disclosure are not limited tothe use of any one probe or combination of probes, or any probe at all.For instance, in some embodiments, a probe may be deployed and one ormore results of the probe may be logged (e.g., in association with asegment identifier and/or alongside one or more metrics associated withthe segment identifier). Such a result may be used to determine if asubsequent probe is to be deployed. Additionally, or alternatively, sucha result may be used to facilitate scoring and/or classifying futuredigital interactions.

In one example, a same form of input pattern may be observed severaltimes in a short window of time, which may represent an anomalously highrate. Additionally, it may be observed that a same user agent isinvolved in all or a significant portion of the digital interactionsexhibiting the suspicious input pattern. This may indicate a potentialhigh volume automated attack, and may cause one or more probes to bedeployed to obtain more information about a potential automation method.

In some embodiments, multiple security probes may be deployed, whereeach probe may be designed to discover different information. Forexample, information collected by a probe may be used by a securitysystem to inform the decision of which one or more other probes todeploy next. In this manner, the security system may be able to gain anin-depth understanding into network traffic (e.g., website and/orapplication traffic). For instance, the security system may be able toclassify traffic in ways that facilitate identification of malicioustraffic, define with precision what type of attack is being observed,and/or discover that some suspect behavior is actually legitimate. Theseresults may indicate not only a likelihood that certain traffic ismalicious, but also a likely type of malicious traffic. Therefore, suchresults may be more meaningful than just a numeric score. For instance,if multiple probe results indicate a digital interaction is legitimate,a determination may be made that an initial identification of thedigital interaction as being suspicious may be a false positiveidentification.

FIG. 19 shows an illustrative process 1900 for dynamically deployingmultiple security probes, in accordance with some embodiments. Like theillustrative process 1700 of FIG. 17, the process 1900 may be performedby a security system (e.g., the illustrative security system 14 shown inFIG. 1B) to determine if and when to deploy one or more security probes.

Acts 1905, 1910, 1915, and 1920 of the process 1900 may be similar toacts 1705, 1710, 1715, and 1720 of the process 1700, respectively. Atact 1925, the security system may analyze data collected by a probe of afirst type (e.g., Probe 1) deployed at act 1720 to determine what typeof probe to further deploy to the digital interaction. For example, if aresult of Probe 1 is positive (e.g., a suspicious pattern isidentified), a probe of a second type (e.g., Probe 2) may be deployed atact 1930 to further investigate the digital interaction. At act 1940,the security system may analyze data collected by Probe 2 to determinewhat, if any, action may be appropriate.

If instead the result of Probe 1 is negative (e.g., no suspiciouspattern identified) at act 1925, a probe of a third type (e.g., Probe 3)may be deployed at act 1935 to further investigate the digitalinteraction. At act 1945, the security system may analyze data collectedby Probe 3 to determine what, if any, action may be appropriate.

As an example, a first probe may be deployed to verify if the client isrunning JavaScript. This probe may include a JavaScript snippet, and maybe deployed only in one or a small number of suspicious interactions, tomake it more difficult for an attacker to detect the probe. If a resultof the first probe indicates that the client is running JavaScript, thesecurity system may determine that an attacker may be employing sometype of GUI macro, and a subsequent probe may be sent to confirm thishypothesis (e.g., by altering a layout of a form). If a result of thefirst probe indicates that the client is not running JavaScript, thesecurity system may determine that an attacker may be employing sometype of CLI script, and a subsequent probe may be sent to furtherdiscover one or more script capabilities and/or methods used to spoofform input. This decision-making pattern may be repeated until alldesired information has been collected about the potential attack.

It should be appreciated that aspects of the present disclosure are notlimited to the use of the illustrative decision process described above.For instance, FIG. 20 shows an example of a decision tree that may beused by a security system to determine whether to deploy a probe and/orwhich one or more probes are to be deployed, in accordance with someembodiments.

In some embodiments, some or all JavaScript code may be obfuscatedbefore being sent to a client. For instance, one or more obfuscationtechniques may be used to hide logic for one or more probes. Examples ofsuch techniques include, but are not limited to, symbol renaming and/orre-ordering, code minimization, logic shuffling, and fabrication ofmeaningless logic (e.g., additional decision and control statements thatare not required for the probe to function as intended). The inventorshave recognized and appreciated that one or more of these and/or othertechniques may be applied so that a total amount of code (e.g., in termsof number of statements and/or number of characters) does not increasesignificantly despite the inclusion of one or more probes, which mayreduce the likelihood of an attacker discovering a probe. However, itshould be appreciated that aspects of the present disclosure are notlimited to the use of any probe obfuscation technique.

Some security systems use threshold-type sensors to trigger actions. Forinstance, a sensor may be set up to monitor one or more attributes of anentity and raise an alert when a value of an attribute falls above orbelow an expected threshold. Similarly, an expected range may be used,and an alert may be raised when the value of the attribute falls outsidethe expected range. The threshold or range may be determined manually byone or more data scientists, for example, by analyzing a historical dataset to identify a set of acceptable values and setting the threshold orrange based on the acceptable values.

The inventors have recognized and appreciated some disadvantages of theabove-described approach for tuning sensors. For example:

-   -   The above-described approach assumes that a historical data set        already exists or will be collected. Depending on the volume of        digital interactions, it may take a month or more to collect a        data set of an appropriate sample size.    -   In some instances, significant processing and modeling may be        performed on the dataset, which may take more than one week.    -   The analysis of the historical data set may require a        significant amount of human involvement.

Accordingly, in some embodiments, a security system is provided that isprogramed to monitor one or more digital interactions and tune a sensorbased on data collected from the digital interactions. Such monitoringand tuning may be performed with or without human involvement. In someembodiments, the monitoring and tuning may be performed in real time,which may allow the security system to react to an attack as soon as theattack is suspected, rather than waiting for data to be accumulated andanalyzed over several weeks. In this manner, one or more actions may betaken while the attack is still on-going to stop the attack and/orcontrol damages. However, it should be appreciated that real time tuningis not required, as data may alternatively, or additionally, beaccumulated and analyzed after the attack.

In some embodiments, a security system may be configured to use one ormore sensors to collect data from one or more digital interactions. Thesystem may analyze the collected data to identify a baseline of expectedbehavior, and then use the identified baseline to tune the one or moresensors, thereby providing a feedback loop. For example, in someembodiments, the system may accumulate the data collected by the one ormore sensors over time and use the accumulated data to build a model ofbaseline behavior.

In some embodiments, data collected by one or more sensors may besegmented. The inventors have recognized and appreciated thatsegmentation may allow a security system to deal with large amounts ofdata more efficiently. For instance, the security system may groupobserved entities and/or digital interactions into buckets based oncertain shared characteristics. As one example, each entity or digitalinteraction may be associated with one of several buckets based on atyping speed detected for the entity or digital interaction. The bucketsmay be chosen in any suitable manner. For instance, more buckets may beused when finer-grained distinctions are desirable. In one example, anentity or digital interaction may be associated with one of fourdifferent buckets based on typing speed: 0-30 words per minute, 31-60words per minute, 61-90 words per minute, and 90+ words per minute.Other configurations of buckets are also possible, as aspects of thepresent disclosure are not limited to the use of any particularconfiguration. Also, it should be appreciated that segmentation may beperformed on any type of measurements, including, but not limited to,typing speed, geo-location, user agent, and/or device ID.

In some embodiments, data collected by one or more sensors may bequantized to reduce the number of possible values for a particularattribute, which may allow a security system to analyze the data moreefficiently. In some embodiments, quantization may be performed using ahash-modding process, which may involve hashing an input value andperforming a modulo operation on the resulting hash value. However, itshould be appreciated that aspects of the present disclosure are notlimited to the use of hash-modding, as other quantization methods mayalso be suitable.

In some embodiments, a hashing technique may be used that produces asame hash value every time given a same input value, and the hash valuemay be such that it is difficult to reconstruct the input from the hashvalue alone. Such a hash function may allow comparison of attributevalues without exposing actual data. For example, a security system mayhash a credit card number to produce an alphanumeric string such as thefollowing:

12KAY8XOOW0881PWBM81KJCUYPDXHG

If hashed again in the future, the same credit card number may producethe same hash value. Furthermore, the hash function may be selected suchthat so that no two inputs are mapped to the same hash value, or thenumber of such pairs is small. As a result, the likelihood of twodifferent credit card numbers producing the same hash value may be low,and the security system may be able to verify if a newly submittedcredit card number is the same as a previously submitted credit cardnumber by simply computing a hash value of newly submitted credit cardnumber and comparing the computed hash value against a stored hash valueof the previously submitted credit card number, without having to storethe previously submitted credit card number.

The inventors have recognized and appreciated that a hash function maybe used to convert input data (including non-numerical input data) intonumerical values, while preserving a distribution of the input data. Forexample, a distribution of output hash values may approximate thedistribution of the input data.

In some embodiments, a modulo operation (e.g., mod M, where M is a largenumber) may be applied to a numerical value resulting from hashing orotherwise converting an input value. This may reduce a number ofpossible output values (e.g., to M, if the modulo operation is mod M).Some information on the distribution of the input data may be lost, asmultiple input values may be mapped to the same number under the modulooperation. However, the inventors have recognized and appreciated thatsufficient information may be retained for purposes of detectinganomalies.

In some embodiments, a hash-modding process may be applied in analyzingnetwork addresses. The addresses may be physical addresses and/orlogical addresses, as aspects of the present disclosure are not limitedto the use of hash-modding to analyze any particular type of input data.The inventors have recognized and appreciated that some networkaddresses are long. For example, an Internet Protocol version 4 (IPv4)address may include 32 bits, while an Internet Protocol version 6 (IPv6)address may include 128 bits (e.g., eight groups of four hexadecimaldigits). The inventors have recognized and appreciated that comparingsuch addresses against each other (e.g., comparing a currently observedaddress against a set of previously observed addresses) may require asignificant amount of time and/or processing power. Therefore, it may bebeneficial to reduce the length of each piece of data to be compared,while preserving the salient information contained in the data.

In one illustrative example, the following IP addresses may be observed.

22.231.113.64

194.66.82.11

These addresses may be hashed to produce the following values,respectively.

9678a5be1599cb7e9ea7174aceb6dc93

6afd70b94d389a30cb34fb7f884e9941

In some embodiments, instead of comparing the input IP addresses againsteach other, or the hash values against each other, a security system mayonly compare portions of the hash values. For instance, a securitysystem may extract one or more digits from each hash value, such as oneor more least significant digits (e.g., one, two, three, four, five,six, seven, eight, nine, ten, etc.), and compare the extracted digits.In the above example, two least significant digits may be extracted fromeach hash value, resulting in the values 93 and 41, respectively. It maybe more efficient to compare 93 against 41, as opposed to comparing22.231.113.64 against 194.66.82.11.

The extraction of one or more least significant digits may be equivalentto a modulo operation. For example, extracting one least significanthexadecimal digit may be equivalent to mod 16, extracting two leastsignificant hexadecimal digits may be equivalent to mod 256, etc.However, it should be appreciated that aspects of the present disclosureare not limited to the use of base-16 numbers, as one or more othernumeral systems (e.g., base 2, base 8, base 10, base 64, etc.) may beused instead of, or in addition to, base 16.

The inventors have recognized and appreciated that, if the extracteddigits for two input IP addresses are different, the security system mayinfer, with 100% confidence, that the two input IP addresses aredifferent. Thus, hash-modding may provide an efficient way to confirmthat two input IP addresses are different. The inventors have furtherrecognized and appreciated that, if the extracted digits for two inputIP addresses are same, the security system may infer, with some level ofconfidence, that the two input IP addresses are the same.

In some embodiments, a level of confidence that two input IP addressesare the same may be increased by extracting and comparing more digits.For instance, in response to determining that the extracted digits fortwo input IP addresses are same, two more digits may be extracted fromeach input IP address and compared. This may be repeated until asuitable stopping condition is reached, for example, if the newlyextracted digits are different, or some threshold number of digits havebeen extracted. The threshold number may be selected to provide adesired level of confidence that the two input IP addresses are thesame. In this manner, additional processing to extract and compare moredigits may be performed only if the processing that has been done doesnot yield a definitive answer. This may provide improved efficiency.However, it should be appreciated that aspects of the present disclosureare not limited to extracting and comparing digits in two-digitincrements, as in some embodiments extraction and comparison may beperformed in one-digit increments, three-digit increments, four-digitincrements, etc., or in some non-uniform manner. Furthermore, in someembodiments, all digits may be extracted and compared at once, with noincremental processing.

The inventors have recognized and appreciated that observed IP addressesmay cluster around certain points. For instance, a collection of IPaddress may share a certain prefix. An example of clustered addresses isshown below:

1.22.231.113.64

1.22.231.113.15

1.22.231.113.80

1.22.231.113.80

1.22.231.113.52

The inventors have further recognized and appreciated that, by hashingIP addresses, the observations may be spread more evenly across a numberline. For example, the following three addresses may be spread out afterhashing, even though they share nine out of eleven digits.

1.22.231.113.64

1.22.231.113.15

1.22.231.113.52

On the other hand, the following two addresses may be hashed to the samevalue because they are identical, and that hash value may be spacedapart from the hash values for the above three addresses.

1.22.231.113.80

1.22.231.113.80

In some embodiments, IP addresses may be hashed into a larger space, forexample, to spread out the addresses more evenly, and/or to decrease thelikelihood of collisions. For instance, a 32-bit IPv4 address may behashed into a 192-bit value, and likewise for a 128-bit IPv6 address.However, it should be appreciated that aspects of the present disclosureare not limited to the use of 192-bit hash values. Moreover, anysuitable hash function may be used, including, but not limited to, MD5,MD6, SHA-1, SHA-2, SHA-3, etc.

In some embodiments, hash-modding may be used to analyze any suitabletype of input data, in addition to, or instead of, IP addresses. Theinventors have recognized and appreciated that hash-modding may providea variable resolution with variable accuracy, which may allow storagerequirement and/or efficiency to be managed. For instance, in someembodiments, a higher resolution (e.g., extracting and comparing moredigits) may provide more certainty about an observed behavior, but evena lower resolution may provide sufficient information to label theobserved behavior. For example, even with a relatively low resolution of10 bits (and thus 210=1024 possible output values), a security systemmay be able to differentiate, with a reasonable level of certainty,whether a user is typing the same password 10 times, or trying 10different passwords, because the likelihood of 10 randomly chosenpasswords all having the same last 10 bits after hash-modding may besufficiently low.

Although various techniques are described above for modeling any type ofinput data as a numerical data set, it should be appreciated that suchexamples are provided solely for purposes of illustration, and thatother implementations may be possible. For instance, although a hashfunction may be used advantageously to anonymize input data, one or moreother functions (e.g., a one-to-one function with numerical outputvalues) may, alternatively, or additionally, be used to convert inputdata. Moreover, in some embodiments, a modulo operation may be performeddirectly on an input, without first hashing the input (e.g., where theinput is already a numerical value). However, it should be appreciatedthat aspects of the present disclosure are not limited to the use of amodulo operation. One or more other techniques for dividing numericalvalues into buckets may be used instead of, or in addition to, a modulooperation.

In some embodiments, a security system may create a feedback loop togain greater insight into historical trends. For example, the system mayadapt a baseline for expected behavior and/or anomalous behavior (e.g.,thresholds for expected and/or anomalous values) based on currentpopulation data and/or historical data. Thus, a feedback loop may allowthe system to “teach” itself what an anomaly is by analyzing historicaldata.

As one example, a system may determine from historical data that aparticular user agent is associated with a higher risk for fraud, andthat the user agent makes up only a small percentage (e.g., 1%) of totaltraffic. If the system detects a dramatic increase in the percentage oftraffic involving that user agent in a real-time data stream, the systemmay determine that a large-scale fraud attack is taking place. Thesystem may continually update an expected percentage of trafficinvolving the user agent based on what the system observes over time.This may help to avoid false positives (e.g., resulting from the useragent becoming more common among legitimate digital interactions) and/orfalse negatives (e.g., resulting from the user agent becoming lesscommon among legitimate digital interactions).

As another example, the system may determine from historical data that avast majority of legitimate digital interactions have a recorded typingspeed between 30 and 80 words per minute. If the system detects that alarge number of present digital interactions have an improbably hightyping speed, the system may determine that a large-scale fraud attackis taking place. The system may continually update an expected range oftyping speed based on what the system observes over time. For example,at any given point in time, the expected range may be determined as arange that is centered at an average (e.g., mean, median, or mode) andjust large enough to capture a certain percentage of all observations(e.g., 95%, 98%, 99%, etc.). Other techniques for determining anexpected range may also be used, as aspects of the present disclosureare not limited to any particular manner of implementation.

It should be appreciated that a historical baseline may change for anynumber of legitimate reasons. For instance, the release of a new browserversion may change the distribution of user agents Likewise, a shift insite demographics or username/password requirements may change the meantyping speed. By continually analyzing incoming observations, the systemmay be able to redraw the historical baseline to reflect any “newnormal.” In this manner, the system may be able to adapt itselfautomatically and with greater accuracy and speed than a human analyst.

FIG. 21 shows, schematically, an illustrative computer 5000 on which anyaspect of the present disclosure may be implemented. In the embodimentshown in FIG. 21, the computer 5000 includes a processing unit 5001having one or more processors and a non-transitory computer-readablestorage medium 5002 that may include, for example, volatile and/ornon-volatile memory. The memory 5002 may store one or more instructionsto program the processing unit 5001 to perform any of the functionsdescribed herein. The computer 5000 may also include other types ofnon-transitory computer-readable medium, such as storage 5005 (e.g., oneor more disk drives) in addition to the system memory 5002. The storage5005 may also store one or more application programs and/or externalcomponents used by application programs (e.g., software libraries),which may be loaded into the memory 5002.

The computer 5000 may have one or more input devices and/or outputdevices, such as devices 5006 and 5007 illustrated in FIG. 21. Thesedevices can be used, among other things, to present a user interface.Examples of output devices that can be used to provide a user interfaceinclude printers or display screens for visual presentation of outputand speakers or other sound generating devices for audible presentationof output. Examples of input devices that can be used for a userinterface include keyboards and pointing devices, such as mice, touchpads, and digitizing tablets. As another example, the input devices 5007may include a microphone for capturing audio signals, and the outputdevices 5006 may include a display screen for visually rendering, and/ora speaker for audibly rendering, recognized text.

As shown in FIG. 21, the computer 5000 may also comprise one or morenetwork interfaces (e.g., the network interface 5010) to enablecommunication via various networks (e.g., the network 5020). Examples ofnetworks include a local area network or a wide area network, such as anenterprise network or the Internet. Such networks may be based on anysuitable technology and may operate according to any suitable protocoland may include wireless networks, wired networks or fiber opticnetworks.

Having thus described several aspects of at least one embodiment, it isto be appreciated that various alterations, modifications, andimprovements will readily occur to those skilled in the art. Suchalterations, modifications, and improvements are intended to be withinthe spirit and scope of the present disclosure. Accordingly, theforegoing description and drawings are by way of example only.

The above-described embodiments of the present disclosure can beimplemented in any of numerous ways. For example, the embodiments may beimplemented using hardware, software or a combination thereof. Whenimplemented in software, the software code can be executed on anysuitable processor or collection of processors, whether provided in asingle computer or distributed among multiple computers.

Also, the various methods or processes outlined herein may be coded assoftware that is executable on one or more processors that employ anyone of a variety of operating systems or platforms. Additionally, suchsoftware may be written using any of a number of suitable programminglanguages and/or programming or scripting tools, and also may becompiled as executable machine language code or intermediate code thatis executed on a framework or virtual machine.

In this respect, the concepts disclosed herein may be embodied as anon-transitory computer-readable medium (or multiple computer-readablemedia) (e.g., a computer memory, one or more floppy discs, compactdiscs, optical discs, magnetic tapes, flash memories, circuitconfigurations in Field Programmable Gate Arrays or other semiconductordevices, or other non-transitory, tangible computer storage medium)encoded with one or more programs that, when executed on one or morecomputers or other processors, perform methods that implement thevarious embodiments of the present disclosure discussed above. Thecomputer-readable medium or media can be transportable, such that theprogram or programs stored thereon can be loaded onto one or moredifferent computers or other processors to implement various aspects ofthe present disclosure as discussed above.

The terms “program” or “software” are used herein to refer to any typeof computer code or set of computer-executable instructions that can beemployed to program a computer or other processor to implement variousaspects of the present disclosure as discussed above. Additionally, itshould be appreciated that according to one aspect of this embodiment,one or more computer programs that when executed perform methods of thepresent disclosure need not reside on a single computer or processor,but may be distributed in a modular fashion amongst a number ofdifferent computers or processors to implement various aspects of thepresent disclosure.

Computer-executable instructions may be in many forms, such as programmodules, executed by one or more computers or other devices. Generally,program modules include routines, programs, objects, components, datastructures, etc. that perform particular tasks or implement particularabstract data types. Typically the functionality of the program modulesmay be combined or distributed as desired in various embodiments.

Also, data structures may be stored in computer-readable media in anysuitable form. For simplicity of illustration, data structures may beshown to have fields that are related through location in the datastructure. Such relationships may likewise be achieved by assigningstorage for the fields with locations in a computer-readable medium thatconveys relationship between the fields. However, any suitable mechanismmay be used to establish a relationship between information in fields ofa data structure, including through the use of pointers, tags or othermechanisms that establish relationship between data elements.

Various features and aspects of the present disclosure may be usedalone, in any combination of two or more, or in a variety ofarrangements not specifically discussed in the embodiments described inthe foregoing and is therefore not limited in its application to thedetails and arrangement of components set forth in the foregoingdescription or illustrated in the drawings. For example, aspectsdescribed in one embodiment may be combined in any manner with aspectsdescribed in other embodiments.

Also, the concepts disclosed herein may be embodied as a method, ofwhich an example has been provided. The acts performed as part of themethod may be ordered in any suitable way. Accordingly, embodiments maybe constructed in which acts are performed in an order different thanillustrated, which may include performing some acts simultaneously, eventhough shown as sequential acts in illustrative embodiments.

Use of ordinal terms such as “first,” “second,” “third,” etc. in theclaims to modify a claim element does not by itself connote anypriority, precedence, or order of one claim element over another or thetemporal order in which acts of a method are performed, but are usedmerely as labels to distinguish one claim element having a certain namefrom another element having a same name (but for use of the ordinalterm) to distinguish the claim elements.

Also, the phraseology and terminology used herein is for the purpose ofdescription and should not be regarded as limiting. The use of“including,” “comprising,” “having,” “containing,” “involving,” andvariations thereof herein, is meant to encompass the items listedthereafter and equivalents thereof as well as additional items.

What is claimed is:
 1. A computer-implemented method for analyzing aplurality of digital interactions, the method comprising acts of: (A)identifying a plurality of values of an attribute, each value of theplurality of values corresponding respectively to a digital interactionof the plurality of digital interactions; (B) dividing the plurality ofvalues into a plurality of buckets; (C) for at least one bucket of theplurality of buckets, determining a count of values from the pluralityof values that fall within the at least one bucket; (D) comparing thecount of values from the plurality of values that fall within the atleast one bucket against historical information regarding the attribute;and (E) determining whether the attribute is anomalous based at least inpart on a result of the act (D).
 2. The method of claim 1, wherein: eachvalue of the plurality of values comprises a time measurement between afirst point and a second point in the corresponding digital interaction;and each bucket of the plurality of buckets comprises a range of timemeasurements.
 3. The method of claim 1, wherein: the act (B) of dividingthe plurality of values into a plurality of buckets comprises applying ahash-modding operation to each value of the plurality of values; andeach bucket of the plurality of buckets corresponds to a residue of thehash-modding operation.
 4. The method of claim 3, further comprisingacts of: (F) recording a plurality of observations with respect to theattribute, each observation of the plurality of observations beingrecorded from a corresponding digital interaction of the plurality ofdigital interactions; and (G) deriving each value of the plurality ofvalues based on the observation recorded from the corresponding digitalinteraction.
 5. The method of claim 1, wherein the historicalinformation regarding the attribute comprises an expected count for theat least one bucket, and wherein the act (D) comprises: comparing thecount of values from the plurality of values that fall within the atleast one bucket against the expected count for the at least one bucket.6. The method of claim 5, wherein the act (E) comprises: determining ifthe count of values from the plurality of values that fall within the atleast one bucket exceeds the expected count for the at least one bucketby at least a selected threshold amount, wherein the attribute isdetermined to be anomalous in response to determining that the count ofvalues from the plurality of values that fall within the at least onebucket exceeds the expected count for the at least one bucket by atleast the selected threshold amount.
 7. The method of claim 5, wherein:the plurality of digital interactions comprises a plurality of firstdigital interactions observed from a first time period; the plurality ofvalues comprises a plurality of first values of the attribute; dividinga plurality of second values of the attribute into the plurality ofbuckets, each value of the plurality of second values correspondingrespectively to a digital interaction of a plurality of second digitalinteractions; the expected count for the at least one bucket comprises acount of values from the plurality of second values that fall within theat least one bucket; the plurality of second digital interactions wereobserved from a second time period, the second time period having a samelength as the first time period; and the first time period occurs afterthe second time period.
 8. The method of claim 5, wherein the pluralityof buckets comprises a plurality of first buckets, and wherein themethod further comprises acts of: determining if the count of valuesfrom the plurality of values that fall within the at least one bucketexceeds the expected count for the at least one bucket by at least aselected threshold amount; and in response to determining that the countof values that fall within the at least one bucket exceeds the expectedcount for the at least one bucket by at least the selected thresholdamount, dividing the plurality of values into a plurality of secondbuckets, wherein there are more second buckets than first buckets. 9.The method of claim 1, wherein the historical information regarding theattribute comprises an expected ratio for the at least one bucket, andwherein the act (D) comprises: determining a ratio between the count ofvalues from the plurality of values that fall within the at least onebucket, and a total count of values from the plurality of values; andcomparing the ratio against the expected ratio for the at least onebucket.
 10. The method of claim 1, further comprising acts of: selectinga plurality of attributes, the plurality of attributes comprising theattribute, wherein acts (A)-(E) are performed for each attribute of theplurality of attributes; and storing, in a profile, informationregarding one or more attributes that are determined to be anomalous.11. A system comprising at least one processor and at least onecomputer-readable storage medium having stored thereon instructionswhich, when executed, program the at least one processor to perform amethod for analyzing a plurality of digital interactions, the methodcomprising acts of: (A) identifying a plurality of values of anattribute, each value of the plurality of values correspondingrespectively to a digital interaction of the plurality of digitalinteractions; (B) dividing the plurality of values into a plurality ofbuckets; (C) for at least one bucket of the plurality of buckets,determining a count of values from the plurality of values that fallwithin the at least one bucket; (D) comparing the count of values fromthe plurality of values that fall within the at least one bucket againsthistorical information regarding the attribute; and (E) determiningwhether the attribute is anomalous based at least in part on a result ofthe act (D).
 12. The system of claim 11, wherein: each value of theplurality of values comprises a time measurement between a first pointand a second point in the corresponding digital interaction; and eachbucket of the plurality of buckets comprises a range of timemeasurements.
 13. The system of claim 11, wherein: the act (B) ofdividing the plurality of values into a plurality of buckets comprisesapplying a hash-modding operation to each value of the plurality ofvalues; and each bucket of the plurality of buckets corresponds to aresidue of the hash-modding operation.
 14. The system of claim 13,wherein the method further comprises acts of: (F) recording a pluralityof observations with respect to the attribute, each observation of theplurality of observations being recorded from a corresponding digitalinteraction of the plurality of digital interactions; and (G) derivingeach value of the plurality of values based on the observation recordedfrom the corresponding digital interaction.
 15. The system of claim 11,wherein the historical information regarding the attribute comprises anexpected count for the at least one bucket, and wherein the act (D)comprises: comparing the count of values from the plurality of valuesthat fall within the at least one bucket against the expected count forthe at least one bucket.
 16. The system of claim 15, wherein the act (E)comprises: determining if the count of values from the plurality ofvalues that fall within the at least one bucket exceeds the expectedcount for the at least one bucket by at least a selected thresholdamount, wherein the attribute is determined to be anomalous in responseto determining that the count of values from the plurality of valuesthat fall within the at least one bucket exceeds the expected count forthe at least one bucket by at least the selected threshold amount. 17.The system of claim 15, wherein: the plurality of digital interactionscomprises a plurality of first digital interactions observed from afirst time period; the plurality of values comprises a plurality offirst values of the attribute; dividing a plurality of second values ofthe attribute into the plurality of buckets, each value of the pluralityof second values corresponding respectively to a digital interaction ofa plurality of second digital interactions; the expected count for theat least one bucket comprises a count of values from the plurality ofsecond values that fall within the at least one bucket; the plurality ofsecond digital interactions were observed from a second time period, thesecond time period having a same length as the first time period; andthe first time period occurs after the second time period.
 18. Thesystem of claim 15, wherein the plurality of buckets comprises aplurality of first buckets, and wherein the method further comprisesacts of: determining if the count of values from the plurality of valuesthat fall within the at least one bucket exceeds the expected count forthe at least one bucket by at least a selected threshold amount; and inresponse to determining that the count of values that fall within the atleast one bucket exceeds the expected count for the at least one bucketby at least the selected threshold amount, dividing the plurality ofvalues into a plurality of second buckets, wherein there are more secondbuckets than first buckets.
 19. The system of claim 11, wherein thehistorical information regarding the attribute comprises an expectedratio for the at least one bucket, and wherein the act (D) comprises:determining a ratio between the count of values from the plurality ofvalues that fall within the at least one bucket, and a total count ofvalues from the plurality of values; and comparing the ratio against theexpected ratio for the at least one bucket.
 20. The system of claim 11,wherein the method further comprises acts of: selecting a plurality ofattributes, the plurality of attributes comprising the attribute,wherein acts (A)-(E) are performed for each attribute of the pluralityof attributes; and storing, in a profile, information regarding one ormore attributes that are determined to be anomalous.
 21. At least onecomputer-readable storage medium having stored thereon instructionswhich, when executed, program at least one processor to perform a methodfor analyzing a plurality of digital interactions, the method comprisingacts of: (A) identifying a plurality of values of an attribute, eachvalue of the plurality of values corresponding respectively to a digitalinteraction of the plurality of digital interactions; (B) dividing theplurality of values into a plurality of buckets; (C) for at least onebucket of the plurality of buckets, determining a count of values fromthe plurality of values that fall within the at least one bucket; (D)comparing the count of values from the plurality of values that fallwithin the at least one bucket against historical information regardingthe attribute; and (E) determining whether the attribute is anomalousbased at least in part on a result of the act (D).
 22. The at least onecomputer-readable storage medium of claim 21, wherein: each value of theplurality of values comprises a time measurement between a first pointand a second point in the corresponding digital interaction; and eachbucket of the plurality of buckets comprises a range of timemeasurements.
 23. The at least one computer-readable storage medium ofclaim 21, wherein: the act (B) of dividing the plurality of values intoa plurality of buckets comprises applying a hash-modding operation toeach value of the plurality of values; and each bucket of the pluralityof buckets corresponds to a residue of the hash-modding operation. 24.The at least one computer-readable storage medium of claim 23, whereinthe method further comprises acts of: (F) recording a plurality ofobservations with respect to the attribute, each observation of theplurality of observations being recorded from a corresponding digitalinteraction of the plurality of digital interactions; and (G) derivingeach value of the plurality of values based on the observation recordedfrom the corresponding digital interaction.
 25. The at least onecomputer-readable storage medium of claim 21, wherein the historicalinformation regarding the attribute comprises an expected count for theat least one bucket, and wherein the act (D) comprises: comparing thecount of values from the plurality of values that fall within the atleast one bucket against the expected count for the at least one bucket.26. The at least one computer-readable storage medium of claim 25,wherein the act (E) comprises: determining if the count of values fromthe plurality of values that fall within the at least one bucket exceedsthe expected count for the at least one bucket by at least a selectedthreshold amount, wherein the attribute is determined to be anomalous inresponse to determining that the count of values from the plurality ofvalues that fall within the at least one bucket exceeds the expectedcount for the at least one bucket by at least the selected thresholdamount.
 27. The at least one computer-readable storage medium of claim25, wherein: the plurality of digital interactions comprises a pluralityof first digital interactions observed from a first time period; theplurality of values comprises a plurality of first values of theattribute; dividing a plurality of second values of the attribute intothe plurality of buckets, each value of the plurality of second valuescorresponding respectively to a digital interaction of a plurality ofsecond digital interactions; the expected count for the at least onebucket comprises a count of values from the plurality of second valuesthat fall within the at least one bucket; the plurality of seconddigital interactions were observed from a second time period, the secondtime period having a same length as the first time period; and the firsttime period occurs after the second time period.
 28. The at least onecomputer-readable storage medium of claim 25, wherein the plurality ofbuckets comprises a plurality of first buckets, and wherein the methodfurther comprises acts of: determining if the count of values from theplurality of values that fall within the at least one bucket exceeds theexpected count for the at least one bucket by at least a selectedthreshold amount; and in response to determining that the count ofvalues that fall within the at least one bucket exceeds the expectedcount for the at least one bucket by at least the selected thresholdamount, dividing the plurality of values into a plurality of secondbuckets, wherein there are more second buckets than first buckets. 29.The at least one computer-readable storage medium of claim 21, whereinthe historical information regarding the attribute comprises an expectedratio for the at least one bucket, and wherein the act (D) comprises:determining a ratio between the count of values from the plurality ofvalues that fall within the at least one bucket, and a total count ofvalues from the plurality of values; and comparing the ratio against theexpected ratio for the at least one bucket.
 30. The at least onecomputer-readable storage medium of claim 21, wherein the method furthercomprises acts of: selecting a plurality of attributes, the plurality ofattributes comprising the attribute, wherein acts (A)-(E) are performedfor each attribute of the plurality of attributes; and storing, in aprofile, information regarding one or more attributes that aredetermined to be anomalous.